Digital media recognition apparatus and methods

ABSTRACT

Physical objects, including still and moving images, sound/audio and text are transformed into more compact forms for identification and other purposes using a method unrelated to existing image-matching systems which rely on feature extraction. An auxiliary construct, preferably a warp grid, is associated with an object, and a series of transformations are imposed to generate a unique visual key for identification, comparisons, and other operations. Search methods are also disclosed for matching an unknown image to one previously represented in a visual key database. Broadly, a preferred search method sequentially examines candidate database images for their closeness of match in a sequential order determined by their a priori match probability. Thus, the most likely match candidate is examined first, the next most likely second, and so forth. With respect to the recognition of video sequences and other information streams, inventive holotropic stream recognition principles are deployed, wherein the statistics of the spatial distribution of warp grid points is used to generate index keys. The invention is applicable to various fields of endeavor, including governmental, scientific, industrial, commercial, and recreational object identification and information retrieval. Extensions of the technology are also disclosed to achieve a uniform distribution of objects over the database search, a consideration which is central to scalability. In particular, a generalized method has been developed based on reticle projection, which greatly enhances the uniformity of object distributions in the collected data Thus, whereas statistical criteria are used with respect to particular embodiments in transforming a construct associated with an image, audio, text or other representation, a reticle projection may alternatively be used in attribute transformation according to alternative embodiments of the invention.

FIELD OF THE INVENTION

[0001] This invention relates generally to digital media processing and,in particular, to methods whereby still or moving images, audio, textand other objects may be transformed into more compact forms forcomparison and other purposes.

BACKGROUND OF THE INVENTION

[0002] There are many systems in common use today whose function isautomatic object identification. Many make use of cameras or scanners tocapture images of objects, and employ computers to analyze the images.Examples are bill changing machines, optical character readers, bloodcell analyzers, robotic welders, electronic circuit inspectors, to namea few. Each application is highly specialized, and the detailed designand implementation of each system is finely engineered to the specificrequirements of the particular application, most notably the visualcharacteristics of the objects to be recognized. A device that is highlyaccurate in recognizing a dollar bill would be worthless in recognizinga white blood cell.

[0003] The more general problem of identifying an image (or any objectthrough the medium of an image) based solely upon the pictorial contentof the image has not been satisfactorily addressed. Considering that thepremier model for a generalized identification system is the one whichwe all carry upon our shoulders, i.e., the human brain, it is notsurprising that the general system does not yet exist. Any child canidentify a broad range of pictures better than can any machine, but ourunderstanding of the processes involved are so rudimentary as to be ofno help in solving the problem.

[0004] As a result, the means that have been employed amount to theshrewd applications of heuristic methods. Such methods generally arederived from the requirements of a particular problem. Currenttechnology often uses such an approach to successfully solve specificproblems, but the solution to the general image identification problemhas remained remote.

[0005] The landscape of the patent literature referring to imageidentification is broad, but very shallow. The following is a summary oftwo selected patents a three commercial systems which are considered torepresent the current state-of-the-art.

[0006] U.S. Pat. No. 5,893,095 to Jain et al presents a detailedframework for a pictorial content based image retrieval system and evenpresents this framework in representative hardware. Flowcharts are givendescribing the operation of the framework system. The system depends foridentification upon the matching of visual features derived from theimage pictorial content. Examples of these visual features are hue,saturation and intensity histograms; edge density; randomness;periodicity; algebraic moments of shapes; etc. Some of these featuresare computed over the entire image and some are computed over a smallregion of the image. Jain does not reveal the methods through which suchvisual features are discerned. These visual features are expressed inJain's system as “primitives”, which appear to be constructed from thevisual features at the discretion of a human operator.

[0007] A set of primitives and primitive weightings appropriate to eachimage is selected by the operator and stored in a database. When anunknown image is presented for identification it can either be processedautonomously to create primitives or the user can specify propertiesand/or areas of interest to be used for identification. A match isdetermined by comparing the vector of weighted primitive featuresobtained for the query image against the all the weighted primitivefeature vectors for the images in the database.

[0008] Given the information provided by Jain, one skilled in the artcould not construct a viable image identification system because theperformance of the system is dependent upon the skill of the operator atselecting primitives, primitive weightings, and areas of interest.Assuming that Jain ever constructed a functioning system, it is not atall clear that the system described could perform the desired function.Jain does not provide any enlightenment concerning realizable systemperformance.

[0009] U.S. Pat. No. 5,852,823 to De Bonet teach an image recognitionsystem that is essentially autonomous. Image feature information isextracted through application of particular suitable algorithms,independent of human control. The feature information thus derived isstored in a database, which can then be searched by conventional means.De Bonet's invention offers essentially autonomous operation (e suggeststhat textual information might be associated with collections of imagesgrouped by subject, date, etc. to thereby subdivide the database) andthe use of features derived from the whole of the image. Another pointof commonality is the so-called “query by example” paradigm, wherein theinformation upon which a search of the image database is predicated uponinformation extracted exclusively from the pictorial content of theunknown image.

[0010] De Bonet takes some pains to distinguish his technology from thatdeveloped by IBM and Illustra Information Technologies, which aredescribed later in this section. He is quite critical of thosetechnologies, declaring that they can address only a small range ofimage identification and retrieval functions.

[0011] De Bonet refers to the features that he extracts from images asthe image's signature. The signature for a given image is computedaccording to the following sequence of operation: (1) The image is splitinto three images corresponding to the three color bands. Each of thesethree images is convolved with each of 25 pre-determined and invariantkernels. (2) The 75 resulting images are each summed over the image'srange of pixels, and the 75 sums become part of the image's signature.(3) Each of the 75 convolved images is again convolved with the same setof 25 kernels. Each of the resulting 1875 images is summed over itsrange of pixels, and the 1875 sums become part of the image's signature.(4) Each of the 1875 convolved images it convolved a third time with thesame set of 25 kernels. The resulting 46,875 images are each summed overthe image's range of pixels, and the 46,875 sums become part of theoriginal image's signature.

[0012] In the simplest case, then, the 48,825 sums (46,875+1875+75)serving as the signature are stored in an image database, along withancillary information concerning the image. It should be noted that thisdescription was obtained from DeBonet's invention summary. Later, heuses just the 46,875 elements obtained from the third convolution. Anunknown image is put through the same procedure. The signature of theunknown image is then compared to the signatures stored in the databaseone at a time, and the best signature matches are reported. Thecorresponding images are retrieved from an image library for furtherexamination by the system user.

[0013] In a somewhat more complex scenario, it is posited that thesystem user has a group of images that are related in some way (all areimages of oak trees; all are images of sailboats; etc.). With thesignatures of each member of the group already calculated, the means andvariances of each element of their signatures (all 48,825) are computed,thereby creating a composite signature representing all member images ofthe group, along with a parallel array of variances. When a signature inthe database is compared to a given signature, the difference betweeneach corresponding element of the signatures is inversely weighted bythe variance associated with that element. The implicit assumption uponwhich the weighting process is based is that elements exhibiting theleast variance would be the best descriptors for that group. Inprinciple, the system would return images representative of the commontheme of the group.

[0014] Additionally, such composite signatures can be stored in theimage database. Then, when a signature matching a composite signature isfound, the system returns a group of images which bear a relation to theimage upon which the search was based.

[0015] The system is obviously very computation-intensive. De Bonet useda 200 Mz computer based upon the Intel Pro processor to generate somesystem performance data. He reports that a signature can be computed in1.5 minutes. Using a database of 1500 signatures, image retrieval tookabout 20 seconds. The retrieval time should be a linear function of database size.

[0016] In terms of commercial products, Cognex, Inc. offers an imagerecognition system under the trademarked name “Patmax” intended forindustrial applications concerning the gauging, identification andquality assessment of manufactured components.

[0017] The system is trained on a comprehensive set of parts to beinspected, extracting key features (mostly geometrical) and storing itin a file associated with that particular part. Thereafter, the systemis able to recognize that part under a variety of conditions. It is alsoable to identify independent of object scale and to infer partorientation.

[0018] In the early to mid 1990's, IBM (Almaden Research Center)developed a general-purpose image identification/retrieval system.Reduced to software that runs under the OS/2 operating system, it hasbeen offered for sale as Ultimedia Manager 1.0 and 1.1, successively.

[0019] The system identifies an image principally according to fourkinds of information:

[0020] 1. Average color, calculated by simply adding all of the RGBcolor values in each pixel.

[0021] 2. Color histogram, in which the color space is divided into 64segments. A heuristic method is used to compare one histogram toanother.

[0022] 3. Texture, defined in terms of coarseness, contrast anddirection. These features are extracted from gray-level representationsof the images.

[0023] 4. Shape, defined in terms of circularity, eccentricity, majoraxis direction, algebraic moments, etc.

[0024] In addition to the distinguishing information noted above, whichcan be extracted from a given image automatically, the IBM system issaid to have means through which a user can supplement the informationextracted automatically by manually adding information such asuser-defined shapes, particular areas of interest within the image, etc.

[0025] The system does not rank the stored images in terms of thequality of match to an unknown, but rather selects 20-50 goodcandidates, which must then be manually examined by a human. Thus, itcan barely be called an image identification system.

[0026] Illustra developed a body of technology to be used for imageidentification and retrieval. Informix acquired Illustra in 1996.

[0027] The technology employed is the familiar one of extracting theattributes related to color, composition, structure and texture. Theseattributes are translated into a standard language, which fits into afile structure. Unknown images are decomposed by the same methods intoterms that can be used to search the file structure. The output is saidto return possible matches, ordered from the most to the least probable.The information extracted from the unknown image can be supplemented orreplaced by input data supplied by the user.

[0028] Aside from the general purpose of image identification andretrieval (by Informix's Excalibur System), this technology has beenapplied to the archiving and retrieval of video images (by Virage, Inc.and Techmath).

[0029] Management of information is one of the greatest problemsconfronting our society. As the sheer volume of generated informationincreases dramatically every year, effective and efficient access tostored information becomes a particular concern.

[0030] While information in its physical embodiment was once stored infile cabinets, libraries archives and the like, to be accessed througharcane means such as the Dewey Decimal System, current needs dictatethat information must be stored as digital data in electronic media.Database management systems have been developed to identify and accessinformation that can be simply and uniquely described through theiralphanumeric keywords. A document entitled “New Varieties of Wheat”appearing in the Journal of Agronomy, series 10, volume 3, Jan. 4, 1999is easy to digitize, store and retrieve. The search mechanism, given allof the identifications above, can be swift, efficient and foolproof.Similarly, cross-referencing according to field of interest, subjectmatter, etc. works rather well.

[0031] Currently, however, much of the information with which we areconfronted is presented in pictorial form. Though we can createarbitrarily accurate representations of objects in pictorial form, suchas digital images, and can readily store such images, the accessing andretrieving of this information often presents difficulties. For the sakeof the present discussion, the term “digital image” is defined as afacsimile of a pictorial object wherein the geometrical and chromaticcharacteristics are represented hi digital form.

[0032] Many such images can be stored and retrieved efficiently andaccurately through associated alphanumeric keywords, i.e., meta-data.The associated information Claude Monet-Poppies-1892 might allow theunique identification and retrieval of a famous painting. Graphics usedfor advertising might be identified by the associated information of thedate of creation, the subject matter and the creating advertisementagency. But if one considers the cases of an unattributed painting orundocumented pictorial advertising copy, i.e., no meta-data, suchidentifications become more problematic.

[0033] There are innumerable instances in which one has only the digitalimage on hand (one can always generate a digital image from a physicalobject if need be) and it is desired to access information in a databaseconcerning its identification, its original nature, etc. In such cases,the seeker has no information with which to search an appropriatedatabase, other than the information of the image itself.

[0034] Consider some examples of the cases noted above.

[0035] (1.) Let us postulate that a person had a swatch of fabric havinga particular pattern of colors, shapes, textures, etc. Further, let usassume that the swatch has no identifying labels. The person wishes toidentify the textile. Assuming that a catalog of all fabrics existed,the person might be able to narrow the search through observation of thetype of fabric and the like, but, in general, the person would have nochoice but to visually compare his sample fabric to all the otherfabrics, one at a time.

[0036] (2.) It is desired to identify an unknown person in a photograph,when the person is not otherwise identified, but is thought to bepictorially represented in a database, for example, a database of allpassport pictures. Except for the obvious partitions according to sex ofsubject, age of subject, and other meta-data sortings, there exists noeffective way to identify the person in the photograph other thenthrough direct comparison by humans with all the pictures in thedatabase.

[0037] (3.) A person possesses a porcelain dinner plate of unknownorigin, which is believed to be valuable due to the observablecharacteristics of the object. The person wishes to ascertain thehistory of and the approximate value of the plate. In this case, thepictorial database exists mostly in reference books and in the minds ofexperts. Assuming the first case, the person must compare the object toimages stored in the appropriate books, image by image. In the secondcase, the person must identify an appropriate expert, present the expertwith the object or pictorial representations of the object, and hopethat the expert can locate the proper reference in the database orprovide the required information from memory.

[0038] In all the examples presented above, the problem solution restsupon humans visually comparing objects, or images pf objects, to imagesin a database. As current and future electronic media generate, storeand transmit an ever-increasing torrent of images, for a multitude ofpurposes, it is certain that a great many of these images will be ofsufficient importance that it will be imperative for the imagesthemselves to serve as their own descriptors, i.e., no meta-data. Theproblems of manually associating keyword descriptions, i.e., meta-datato every digitally stored image to permit rapid retrieval from imagedatabases very quickly becomes unmanageable as the number of pertinentimages grows.

[0039] Assuming, then, that an image's composition itself must somehowserve as an image's description in image databases, we immediately arefaced with the problem that the compositions of pictorial images arepresented in a language that we neither speak nor understand. Images arecomposed of shapes, colors, textures, etc., rather than of words ornumbers.

[0040] At a most basic level, a digitized image can be completelydescribed in terms hue, saturation and intensity at each pixel location.There is no more information to be had from the image. Furthermore, thisdefinition of an image is the one definition currently existing which isuniversal and is presented in a language which all can understand.Viewed from this perspective, it is worth investigating further.

[0041] The naive approach to identifying an unknown image by associatingit with a stored image found within a given database of digitized imageswould be to compare a digitized facsimile of the unknown image to eachimage in the database on a pixel by pixel basis. When each pixel of astored image is found to match each pixel of the unknown image, a matchbetween that particular stored image and the unknown image can be saidto have occurred. The unknown image can now be said to be known, to theextent that the ancillary information attached to the stored image cannow be associated with the unknown image.

[0042] When considered superficially, the intuitive procedure givenabove seems to offer a universal solution to the problem of managingimage databases. Practical implementation of such an approach presents aplethora of problems. The process does not provide any obvious means forsubdividing the database into smaller segments, one of which can beknown a priori to contain the unknown image. Thus, the computerperforming the comparisons must do what a human would have to do:compare each database image to the unknown image one at a time on apixel-by-pixel basis. Even for a high-speed computer, this is a verytime consuming process.

[0043] In many cases, the database images and the unknown image are notgeometrically registered to each other. That is, because of relativerotation and/or translation between the database image and the unknownimage, a pixel in the first image will not correspond to a pixel in thesecond. If the degree of relative rotation/translation between the twoimages is unknown or cannot be extracted by some means, identificationof an unknown image by this method becomes essentially impossible for acomputer to accomplish. Because a pixel-by-pixel comparison, commonlyreferred to as template matching, seems to be such an intuitivelyobvious answer to the problem, it has been analyzed and testedextensively and has been found to be impractical for any but thesimplest applications of image matching, such as coin or currencyrecognition.

[0044] All other image recognition schemes with which we are familiarare based upon the extraction of distinctive features from an unknownimage and correlation of such features with a database of like features,with each feature set having been similarly extracted from and relatedto each stored image. The term pattern recognition has come to representall such methods. Examples of such feature sets, which can be extractedand used, might be line segments, defined, perhaps, by the locations ofthe endpoints, by their orientation, by their curvature, etc. Thereduction of images to feature sets is always an attempt to translateimage composition, for which, there is no language, into a restrictivedictionary of image features.

[0045] The selection of feature sets and their application to imagematching have been investigated intensely. The feature sets used havebeen largely based upon the intuition of the process designer. Somesystems of feature matching have performed quite well in image matchingproblems of limited scope (such as identifying a particular manufacturedpart as being of a pre-defined class of similar parts; distinguishingbetween a military tank and a military truck, etc.). However no systemhas yet solved the general problem of matching an unknown image to itscounterpart in an image database.

SUMMARY OF THE INVENTION

[0046] The methods and apparatus of this invention present an effectivemeans for addressing the general problem of image and digital mediarecognition described above. The invention does not depend upon featureextraction, and is not related to any other image- or content-matchingsystem.

[0047] The method derives from the study of certain stochasticprocesses, commonly referred to as chaos theory, in particular, thestudy of strange attractors. In this method, an auxiliary construct, achaotic system, is associated with an “image,” which should be taken toinclude any object or representation to which the invention isapplicable, including image sequences and motion video, audio and otherwaveforms, including speech, and text.

[0048] The auxiliary construct is a dynamic system whose behavior isdescribed by a system of linear differential equations whosecoefficients are dynamically derived from the values of the pixels inthe digital image. As the dynamic system is successively iterated, it isobserved that the system converges towards an attractor state, that is,random behavior becomes predictable and the system reaches anequilibrium configuration. The equilibrium configuration uniquelyrepresents the digital image upon which it has been constructed.

[0049] The form of the auxiliary construct that has been commonly usedduring the development of this invention is a rectangular, orthogonalgrid, though the invention does not depend upon any particular gridform. It is assumed hereafter that a rectangular auxiliary grid is used,and it will hereafter be referred to as the warp grid. The warp grid isassigned a particular mesh scale and location relative to the originalimage. The locations of all grid intersections are noted and stored.

[0050] A series of transformations is then imposed upon the warp grid.Each transformation is governed by a given set of transformation ruleswhich use the current state of the warp grid and the informationcontained in the invariant underlying original image. The gridintersections will generally translate about the warp grid space as theresult of each transformation. However, the identity of eachintersection is maintained. At each iteration of the warp grid, theimage is sampled at the warp grid points. The number of warp grid pointsis many orders of magnitude smaller than the number of pixels in thedigital image, and the number of iterations is on the order of ahundred. The total number of computational steps is well within thecapabilities of ordinary personal computers to implement very rapidly.After a given number of transformations have been performed upon thewarp grid, the final position of each of the grid intersections isnoted. For each grid point, a vector is formed between its originalposition and its final position. The set of all such vectors,corresponding to all of the original grid points, constitutes a uniquerepresentation of the underlying original image, called a Visual Key.

[0051] This resultant set of vectors represents a coherent languagethrough which we can compare and identify distinct images. In thepreferred embodiment, the problem of matching an unknown image to animage in a database, we could use the following procedure. First wewould apply a given warp grid iterative process to each original image.From each such procedure we would obtain a vector set associated withthat image, and the vector set would be stored in a database. An unknownimage that had a correspondent in the database could be processed in thesame way and identified through matching the resultant vector set to oneof the vector sets contained in the database. Of course, auxiliaryinformation commonly used for database searching, such as keywords,could also be used in conjunction with the present invention to augmentthe search process.

[0052] The size of the vector set is small compared to the informationcontained in the image. The vector set is typically on the order of afew kilobytes. Thus, even if the database were to be searchedexhaustively to find a match to an unknown image's vector set, thesearch process will be, fairly rapid even for database containing asignificant number of vector sets. Of greater importance is the factthat the database used for identification of unknown images need notcontain the images themselves, but only the vector sets and enoughinformation to link each vector set to an actual image. The imagesthemselves could be stored elsewhere, perhaps on a large, remote,centrally located storage medium. Thus, a personal computer system,which could not store a million images, could store the correspondingmillion information sets (vector sets plus identification information),each of a few kilobytes in size. As has been mentioned, the personalcomputer would be more than adequate to apply the image transformationoperations to an unknown image in a timely manner. The personal computercould compute the vector set for the unknown image and then could accessthe remote storage medium to retrieve the desired image identificationinformation.

[0053] In practice, however, the matching of vector components can betoo slow to allow a very large database of many millions of images to besearched in a timely manner. As noted in the following, there may not bea perfect match between a vector set derived from an unknown image and avector set stored in the database. A unique search method dealing withthis uncertainty, which is also very fast and efficient, will bedescribed herein.

[0054] The unknown image and the corresponding database image willgenerally have been made either with two different imaging devices, bythe same imaging device at different times, or under differentconditions with different settings. In all cases, any imaging device issubject to uncertainties caused by internal system noise. As a result,the unknown image and the corresponding image in the database willgenerally differ. Because the images differ, the vector sets associatedwith each will generally differ slightly. Thus, as noted above, a givenvector set derived from the unknown image may not have an exactcorrespondent in the database of vector sets. A different aspect of theinvention addresses this problem and simultaneously increases theefficiency of the search process.

[0055] The search process employed by this invention for finding acorresponding image in a database is called squorging, a newly coinedterm derived from the root words sequential and originating. The methodsequentially examines candidate database images for their closeness ofmatch in a sequential order determined by their a priori matchprobability. Thus, the most likely match candidate is examined first,the next most likely second, and so forth. The process terminates when amatch of sufficient closeness is found, or a match of sufficientcloseness has not been found in the maximum allowable number of searchiterations.

[0056] The squorging method depends upon an index being prefixed to eachimage vector set in the database. A pre-selected group of j warp gridpoints is used to construct the index. Each x and y component of thepre-selected group of warp grid vectors is quantized into two intervals,represented by the digits 0 and 1. In effect, each vector set has beenrecast as a set of 2*j lock tumblers, with each tumbler having 2positions. Associated with each vector set in the database, then, is aset of 2*j tumblers, each of which is set to one of 2 values. Theparticular value of each tumbler is determined by which interval thevector component magnitude is quantized into.

[0057] At this point in the process, every entry in the database isassociated with a set of 2*j tumblers, with each tumbler positiondetermined by the underlying vector set components. These tumbler setsare referred to as index keys. Note that there is not necessarily aone-to-one relationship between vector sets and index keys in thedatabase. A single index key can be related to several vector sets.

[0058] Returning to the unknown image, selected elements of its vectorset are similarly recast into an index key. However, in the case of theunknown, statistics which are known a priori are used to calculate themost probable index key associated with the unknown image, the next mostprobable, and so on. The index keys are calculated on demand in order ofdecreasing probability of the unknown index key being the correct one.

[0059] These index keys are checked sequentially against the index keysin the database until one is calculated having an exact correspondent inthe database of index keys. Note that not all of the index keys in thelist necessarily have exact matches in the database of index keys. Ifthe first index key on the list matches an index key in the database,all vector sets associated with that index key are examined to determinethe closest match to the vector set associated with the unknown image.Then the corresponding database image is said to most probably be theunknown image. Likewise, the second, third, etc. most probable matchescan be identified.

[0060] If a match is not found within the scope of the first index key,the first index key calculated is discarded, and the next most probableindex key is calculated. The squorging operation determines whichtumblers in the index key to change to yield the next most probableindex key. The process is repeated until a satisfactory match betweenthe Visual Key Vector associated with the unknown image and a Visual KeyVector in the database is found.

[0061] The squorging method does not perform very well when theindividual picture objects are individual frames of a movie or videostream. The high degree of frame-to-frame correlation necessary toconvey the illusion of subject motion means that individual warp gridvectors are likely to be significantly correlated. This results in anundesirably sparse distribution of index keys with some of the indexkeys being duplicated very many times. Therefore, in order to extend thepresent invention to the recognition of streams, additional algorithmsreferred to as “Holotropic Stream Recognition” are presented.

[0062] Holotropic Stream Recognition (HSR) employs the warp gridalgorithm on each frame of the picture object stream, but rather thananalyzing the warp grid vectors themselves to generate index keys, HSRanalyzes the statistics of the spatial distribution of warp grid pointsin order to generate index keys. Furthermore, rather than employingfixed threshold levels to define individual tumbler probabilities, theHSR methodology constructs a dynamic decision tree whose thresholdlevels are individually adjusted each time an individual tumblerprobability is generated. Finally, the method of squorging itself isreplaced by a statistical inference methodology, which is effectiveprecisely because the individual frames of a picture object stream arehighly correlated.

[0063] Extensions of the technology are also disclosed to achieve auniform distribution of objects over the database search, aconsideration which is central to scalability. In particular, ageneralized method has been developed based on reticle projection, whichgreatly enhances the uniformity of object distributions in the collecteddata. Thus, whereas statistical criteria are used with respect toparticular embodiments in transforming a construct associated with animage, audio, text or other representation, a reticle projection mayalternatively be used in attribute transformation according toalternative embodiments of the invention.

[0064] With specific regard to digital audio, the invention may be usedto create large databases representing popular music which can besuccessfully queried by 1-2 second (or longer) excerpts of such digitalmaterial. As to digital text, the method presented herein offers truefuzzy search capability. A query phrase can deviate substantially fromthe corresponding database phrase and a correct match will still beachieved. Within rather loose limits, words within a phrase can bemisspelled, words can be deleted and/or added, and words can betransposed without preventing a correct match. It is anticipated thatapplication of the text-matching methodology will make text searchengines much more useful. Also, because of its unique capabilities, thismethodology is expected to foster new applications which currenttechnology has not allowed.

BRIEF DESCRIPTION OF THE DRAWINGS

[0065] Picture Example 1 is a movie frame to which the principles of theinvention are applicable;

[0066] Picture Example 2 is an action painting to which the principlesof the invention are applicable;

[0067] Picture Example 3 is a photograph showing exceptional qualitiesof shadow, light and tonal value;

[0068] Picture Example 4 is dog show catalog page to which theprinciples of the invention are applicable;

[0069]FIG. 1 is a flowchart of the process of loading the Visual KeyDatabase;

[0070]FIG. 2 is a flowchart of the process of querying the Visual KeyDatabase;

[0071]FIG. 3 is a flowchart of the process of computing a Visual KeyVector;

[0072]FIG. 4 is an example Picture from the front of a 1985 Topps MarkMcGwire Rookie Baseball Card. This picture is used throughout theillustrations of the Warp Grid adaptation process;

[0073]FIG. 5 is the same Picture as FIG. 4, showing only the red channelof the Digital Image;

[0074]FIG. 6 shows a 16-by-24 Warp Grid plotted in the UV CoordinateSystem;

[0075]FIG. 7 shows an Initialized Warp Grid (16×24) superimposed on aDigital Image;

[0076]FIG. 8 shows a quadrilateral superimposition of a rectangular WarpGrid on a perspective-distorted Digital Image;

[0077]FIG. 9 is a flowchart of the process of computing a Warp GridVector;

[0078]FIG. 10 is a flowchart of the process of adapting the Warp Grid asingle step;

[0079]FIG. 11 is a graph showing the relationship between the magnitudeof Warp Grid Vectors and the number of iterations of the Warp Algorithm.This illustrates the tendency of Warp Grid Vectors to come to a state ofequilibrium;

[0080]FIG. 12 illustrates three possible Connectivity Patterns for anInitialized Warp Grid, showing a Neighborhood Radius of 1, 2 and 3,respectively;

[0081]FIG. 13 shows two different representations of the concept ofToroidal Wrapping of the Neighborhood Points. The center of a givenNeighborhood Connectivity Pattern is treated as if it is in the verycenter of an image that fully wraps around (edgeless);

[0082]FIG. 14 compares the results of 2 different Warp Rates (WR),illustrating that the WR does not have a significant impact on theresulting Equilibrium Configuration;

[0083]FIG. 15 shows Warp Grid results using 3 different ConnectivityPatterns. Although the effect is not drastic, the larger theConnectivity Pattern, the greater the influence of the large, brightregions in the picture;

[0084]FIG. 16 illustrates the initial configuration and the first twoiterations of the Warp Algorithm. The cross hairs on the first twopictures represent the calculated Center-of-gravity of the NeighborhoodConfiguration, toward which all of the points will be adjusted;

[0085]FIG. 17 shows a Warp Grid Adaptation after a single step;

[0086]FIG. 18 shows the same Warp Grid Adaptation as in FIG. 17, afterthree steps. In this figure, the intermediate steps are also shown;

[0087]FIG. 19 shows the same Warp Grid Adaptation as in FIG. 17 and FIG.18, after 250 steps;

[0088]FIG. 20 shows the Warp Grid Vectors for the Warp Grid after it hasreached its equilibrium state;

[0089]FIG. 21 illustrates a Digital Image and a corresponding Warp Gridthat is much finer (96 rows and 64 columns). The Warp Grid is fullyadapted in an Equilibrium Configuration;

[0090]FIG. 24 is a flowchart of the process of generating TumblerProbabilities;

[0091]FIG. 25 is a flowchart of the process of Recursive Squorging;

[0092]FIG. 26A is a Basic Squorger diagram, showing how the Squorgerfunctions at the highest level;

[0093]FIG. 26B shows a decomposition of the Basic Squorger diagram,showing how the nested Squorgers accomplish the work;

[0094]FIG. 27 is a flowchart of the Squorger next method. This is themethod that is used to request the Squorger to fetch the next mostlikely Index Key from among the possible combinations;

[0095]FIG. 28A, FIG. 28B and FIG. 28C detail the process of combiningtwo lists in a Squorger;

[0096]FIG. 28A shows a Squorger combining two lists, with threeconnections made thus far;

[0097]FIG. 28B details the step of finding the fourth connection. Thenext candidate connections are shown in the box on the right;

[0098]FIG. 28C shows the Squorger with nine connections made. At thispoint, the first element in listA has been combined with every elementof listB;

[0099]FIG. 30A, FIG. 30B and FIG. 30C illustrate the process ofHolotropic Stream Database Construction;

[0100]FIG. 30A illustrates collecting the statistics;

[0101]FIG. 30B illustrates constructing the decision tree;

[0102]FIG. 30C illustrates constructing the reference bins;

[0103]FIG. 31 shows the process of Holotropic Stream Query Recognition;

[0104]FIG. 32 shows a demonstration Visual Key Player;

[0105]FIG. 33 shows a Query Stream Recognition Plot;

[0106]FIG. 34 shows a Query Stream Tropic Incidence Diagram;

[0107]FIG. 35 shows a Holotropic Storage Incidence Diagram;

[0108]FIG. 36 is a flowchart of the AutoRun subroutine;

[0109]FIG. 37 is a flowchart of the initialize WarpGrid subroutine;

[0110]FIG. 38 is a flowchart of the sample WarpGrid subroutine;

[0111]FIG. 39 is a flowchart of the adapt WarpGrid subroutine;

[0112]FIG. 40 is a flowchart of the computeStatistics subroutine;

[0113]FIG. 41 is a flowchart of the Learn subroutine;

[0114]FIG. 42 is a flowchart of the computeDecisionTree subroutine;

[0115]FIG. 43 is a flowchart of the statMedian subroutine;

[0116]FIG. 44 is a flowchart of the stuffReferenceBins subroutine;

[0117]FIG. 45 is a flowchart of the Recognize subroutine;

[0118]FIG. 46 is a flowchart of the computeIndexKeys subroutine;

[0119]FIG. 47 is a flowchart of the computeQueryTropic subroutine;

[0120]FIG. 48 is a flowchart of the computeRecognitionHistogramsubroutine;

[0121]FIGS. 49A and 49B together form a flowchart of the process ofdisplaying the Holotropic Stream Query Recognition Results;

[0122]FIG. 50 depicts a media content indexing application according tothe invention;

[0123]FIG. 51 concerns the reticle projection process, and shows twostages of the process used in computing the full projection inconstructing the reticle;

[0124]FIG. 53 shows the process of a compute gene;

[0125]FIG. 54 illustrates the compute projection;

[0126]FIG. 55 shows the net nucleotides from projections;

[0127]FIG. 56 helps to appreciated that once the value of a bit in thegene is determined, it must be determined which codon of the gene isaffected, and which bit of that codon should be set to the determinedvalue;

[0128]FIG. 57 is a diagram that shows when a frame of a media file islearned, a gene representing it is added to the Media Catalog;

[0129]FIG. 58 shows that when a query frame is compared to the MediaCatalog (MC), a histogram (H) is prepared;

[0130]FIG. 59 shows the basic action of a shift register;

[0131]FIG. 60 illustrates the optical reticle projection concept in asingle dimension;

[0132]FIG. 61 illustrates this basic configuration for two-dimensionalimages and reticles; and

[0133] FIGS. 62A-62E illustrate the specific example of a 7-by-9 reticleimplemented as an optical reticle mask. The numbers in the figuresrepresent individual pixels of the reticle, and weight transmitted lightrays by +1 or −1.

LIST OF DEFINITIONS

[0134] Composition

[0135] A specific spatial relationship among various compositedprimitive visual elements including color, shape, dots, arcs, symbols,shading, texture, patterns, etc.

[0136] Connectivity Pattern

[0137] The definition of the set of Warp Grid points that directlyaffect the movement of a given point.

[0138] Decision Tree

[0139] A construct for converting Visual Key Statistics into Index Keys,explicitly constructed from the Reference Stream Statistics File. TheDecision Tree maps individual media frames into Index Keys.

[0140] Digital Image

[0141] An image which exists as an ordered set of digital information. Adigital image could be created entirely within the digital domain orcould be created by converting an existing picture into a digitalcounterpart consisting of ordered digital data. An appropriate viewingdevice is required to produce a representation of the image.

[0142] Displacement Vectors

[0143] Measurements derived by adapting the points on a Warp Grid over aDigital Image of a Picture. Each Displacement Vector represents thedistance moved by an individual point in the Warp Grid after a givennumber of iterations.

[0144] Equilibrium Warp Grid

[0145] The deterministic outcome resulting from the indefinitelycontinued application of geometric modifications to a Warp Grid referredto as adapting steps. The Equilibrium Warp Grid is a configuration ofWarp Grid points that either does not change with additional adaptationiterations or changes very little. The Equilibrium Warp Grid in the formof a Visual Key Vector represents the picture that it was adapted to inthe Visual Key database.

[0146] Holotropic

[0147] The term used to describe the process of recognizing a Streambased on Reference Stream Statistics. A word formed by conjoining holo(meaning “whole”) and tropic (meaning “turning towards”).

[0148] Index Key

[0149] An Index Key is an alphanumeric string that is derived from aVisual Key Vector, used for indexing a large database of Visual KeyVectors.

[0150] Initial Warp Grid

[0151] A Warp Grid as it is first configured, before the Warp Algorithmhas adapted its points.

[0152] Match Score

[0153] A measure of the degree to which a particular entry in thedatabase matches the Query Picture. In the Preferred Embodiment, aperfect match corresponds to a match score of 100, while the worstpossible match corresponds to a score of 0.

[0154] Neighborhood Points

[0155] The set of points (defined by the Connectivity Pattern) thatdirectly affect the movement of a given point.

[0156] Neighborhood Radius

[0157] A Connectivity Pattern defined by the points in a Warp Grid thatare directly adjacent and completely surrounding a given Warp Gridpoint.

[0158] Picture

[0159] A composition of visual elements to which the observer attachesmeaning.

[0160] Picture Content

[0161] The meaning an observer attaches to a Picture's composition.Examples are the Picture's subject, setting and depicted activity.

[0162] Picture Context

[0163] The circumstances of a picture's existence, such as the creator,the date of creation and the current owner.

[0164] Picture Object

[0165] A container which holds information that completely describes thecomposition of a Picture. Picture Objects can be visible, as in aphotograph, or virtual, as in a stored Digital Image.

[0166] Picture Object Collection

[0167] A specific set of Picture Objects.

[0168] Picture Representation

[0169] A facsimile of the Picture used to designate the Picturevisually.

[0170] Picture Stream

[0171] A specific sequence of pictures which, when presented to anobserver by an appropriate apparatus at an appropriate rate, will appearto the observer as depicting continuous motion.

[0172] Query Picture

[0173] An image presented to the Visual Key database system foridentification.

[0174] Query Stream

[0175] A stream presented to the Visual Key database system foridentification.

[0176] Reference Bins

[0177] Holders for Reference Stream frame numbers, sorted according totheir assigned Index Keys. These are used, with the Decision Tree, inthe process of Holotropic Stream Query Recognition.

[0178] Reference Stream

[0179] A stream composited of individual learned streams, forming thebasic data used for recognizing a Query Stream.

[0180] Squorger

[0181] A computer software component that combines two input lists,delivering their joined elements in order of decreasing probability.

[0182] Squorger Tree

[0183] A logical tree structure using Tumbler values and associatedTumbler Probabilities as inputs, while the single output delivers IndexKeys in order of decreasing probability that the output Index Key is thecorrect one.

[0184] Squorging Algorithm

[0185] A deterministic set of operations applied to the squorging treewhich guarantees that the desired sequence of Index Keys will appear atthe tree output when a request is submitted for the next most probableIndex Key.

[0186] Streaming Images

[0187] Using an auxiliary device to generate/transmit/display an orderedsequence of images. Examples are the use of a movie projector to streamfilms or a DVD player to stream recorded video.

[0188] Tropic

[0189] A graphical line segment indicating the trajectory and durationof a Stream. A Query Tropic is produced when the frames of a QueryStream are sequentially matched and plotted against the ReferenceStream.

[0190] Visual Key Collection

[0191] A collection of Visual Key Vectors within the Visual KeyDatabase.

[0192] Visual Key Database

[0193] A database containing Visual Keys and, optionally, other objectssuch as Contents, and Contexts. In addition, the database optionally maycontain Representations and/or Picture Stream Objects.

[0194] A Visual Key Database automatically connects a Picture with itsContent, Context and Representation.

[0195] Visual Key Vector

[0196] A set of measurements analyzed from the Digital Image of aPicture, including the Warp Grid Vector.

[0197] Warp Algorithm

[0198] A deterministic process through which the initial Warp Grid ismodified geometrically according to the composition of an associatedpicture. The process is referred to as adapting, and the final state ofthe Warp Grid is referred to as the adapted Warp Grid.

[0199] Warp Grid

[0200] A geometrical arrangement of points superimposed on the DigitalImage of a Picture for purposes of analysis.

[0201] Warp Parameters

[0202] These are the operating parameters for the Warp Algorithm. Thisset of parameters includes such quantities as the initial gridconfiguration, the Warp Rate and the Connectivity Pattern.

[0203] Warp Rate (WR)

[0204] Constant governing the speed of displacement of the Warp Gridpoints.

[0205] Warp Grid Vector

[0206] The collection of all Displacement Vectors derived from adaptinga Warp Grid to a Digital Image of a Picture.

[0207] Bin

[0208] Container for lists of frame identification numbers. Each bincorresponds to a particular codon in a gene. During recognition, to formthe recognition histogram, a 1 is added to the histogram boxcorresponding to each frame identification number that appears in agiven list. This is done for all of the codons from the gene createdfrom the query frame.

[0209] Codon

[0210] A fixed, specified partitioning of a gene into equal lengthsegments. In the audio application illustrated in FIG. 50, a gene isdivided up into 10 codons, each codon of 9 bits.

[0211] Digital Key

[0212] General term for the encoded representation of a media objectthat is used for automatic recognition. Analogous to a bar code, butderived directly from the media content.

[0213] Frame

[0214] An individual image in a video sequence, or a short fixedduration increment of an audio wave file, or a single line of text in aparagraph. Also, a single still image is a frame.

[0215] Full Projection

[0216] Output of the reticle before it is thresholded, sampled, shuffledand turned into a gene.

[0217] Gene

[0218] Quantized and shuffled reticle projections, uniformly segmentedinto codons. In the audio example of FIG. 50, a gene of 90 bits issegmented into 10 codons of 9 bits each.

[0219] List

[0220] The partitioning of a bin corresponding to a particular codoninto individual ordered collections called lists, there being as many ofthese collections as there are possible states of a codon. In the audioexample of FIG. 50, there are 512 possible states of a 9-bit codon,hence each of the 10 bins corresponding to the 10 codons in the gene has512 possible states or 512 lists

[0221] Match

[0222] The reference (learned) frame (any media) that has the mostcodons in common with the query. The match is determined from therecognition histogram where the match is that reference frame has themost intact codons.

[0223] Media Catalog

[0224] The database of genes that indexes some collection of media filesby frame numbers.

[0225] Nucleotide

[0226] One bit of a codon.

[0227] Reticle

[0228] A maximal length shift register sequence used to weight theTransformed input frames. Used in various analytical techniques tocreate a spectrum of pure white noise.

[0229] Sampled Projection

[0230] A pre-determined subset of the thresholded full projection.

[0231] Shift Register

[0232] Mathematical construct or electronic device used to produce thereticle sequence. The shift register with appropriate feedback taps andlogic provides a means of generating a pseudo-random sequence of thegreatest possible length for any length of shift register.

[0233] Taps

[0234] Positions of the shift register that are sampled and logicallycombined to form the feedback bit that is used to build the maximallength shift register sequence.

[0235] Thresholded Projection

[0236] A full projection is changed from a series of floats to a seriesof bits, by using a pre-determined threshold, commonly set at 0.

[0237] Tropic

[0238] A frame number that repeatedly appears in the lists specified byquery codons as to make itself evident in a recognition histogram. Listof Variables B, b Number of Bins C Set of points which constitute aConnectivity Pattern c “Correct” (stored) value of a Warp Grid Vectorelement CP Current Points (check Compute Warp Grid Vector flowchart)CProb Conditional probability on a continuously varying value. FM Firstmoment i, j Local integer variables g A tumbler in a set of Tumblers GSet of Tumblers K Index Key L Function of image sampled at xy (e.g.level) M, N Dimensions of Warp Grid m, n Indices of the Warp Grid pointsNP Neighborhood points P Picture p Point Prob Probability q Acorresponding point in the Warp Grid following some number of iterationsof the Warp Algorithm. Q Query s Sampled value S Stored value T TumblerTP Tumbler Probability u, v Coordinate system of a Warp Grid. V VectorVS Stored Visual Key Vector WR Warp Rate x, y Cartesian coordinates(check Compute Warp Grid Vector flowchart) ZM Zero^(th) moment

[0239] Discussion of Pictures

[0240] A picture is a composition, of visual elements which may includecolors, shapes, lines, dots, arcs, symbols, shadings, textures andpatterns to which an observer attaches meaning. A picture's content isthe meaning an observer attaches to a picture's composition. A picture'smeaning is determined by the observer's visual comprehension of thepicture composition and his understanding of its visually comprehensibleelements picture content can include the Picture's subject(s), subjectname(s), subject type, subject activity, subject relationships, subjectdescriptors, props, keywords, location and setting.

[0241] A Picture's Composition may include another Picture in additionto other visual elements. For example, a page from an art catalog canhave many individual Pictures on it, and the page itself is also aPicture.

[0242] Pictures can contain words, or be composed of only words.Although it is not the intention of the present invention to recognizeindividual characters, words or phrases, it is capable of matching aPicture composed of words when the arrangement, font, style and color ofthe letters and words in the picture are distinctive and characteristicof the Picture.

[0243] A Picture's Context describes the circumstances of a Picture'sexistence. Picture Context can include the date, title, owner, artist,copyright, rating, producer, session, roll, frame, material, media,storage location, size, proportions, condition or other informationdescribing a Picture's origin, history, creator, ownership, status, etc.

[0244] Both a Picture's Content and a Picture's Context are described inwords, phrases, numbers or symbols, i.e., in a natural language.

[0245] A Picture's Representation is a facsimile of the Picture used todesignate the Picture visually. A Web based thumbnail image is a goodexample of a Picture Representation. It acts as an icon that can beclicked to access a larger scale Picture. Illustrated catalogs ofpaintings and drawings, which accompany many art exhibits, containRepresentations of the items in the exhibit. A Picture Representation isintended to be a visually identifiable icon for a Picture; it is notgenerally intended to be a Reproduction of a Picture. It is frequentlysmaller than the Picture it represents, and generally has less detail. Apicture's Representation may be in a different medium than the Pictureit represents. For example, the Picture Representation in PictureExample 1 below is a jpeg file while the Picture Object is a frame of 8mm film.

[0246] Examples of Pictures

[0247] Picture Examples 1-4 show pictures that might be included in aVisual Key Database. These examples show some (but certainly not all) ofthe variety of the kinds of Pictures that can be effectively stored andretrieved in a Visual Key Database. In each case, the Representation isfollowed by the Context and Content.

[0248] Discussion of Picture Objects

[0249] A Picture Object holds information completely describing thecomposition of a Picture. Examples of Picture Objects include aphotographic negative, a photographic print, a 35 mm slide, a halftonemagazine illustration, an oil painting, a baseball card, a comic bookcover, a clip art file, a bitmapped digital image, a jpeg or gif file,or a hologram. A Picture Object may also hold information in addition toPicture Composition information, for example, a 35 mm photographicnegative displays its frame number, while the back of a baseball cardgenerally gives player statistics for the player pictured on the frontside.

[0250] A Picture Object may be as simple as a black and whitephotograph, which records its data as the spatially varying opticaldensity of an emulsion affixed to the surface of a paper backing,requiring only sufficient ambient visible light for its display. Or aPicture's data may be stored as the overlapping regular patterns oftransparent colored dots in the four color halftone printing process,requiring the observer's eyes and brain to merge the dots into visualmeaning. A single 35 mm slide is a Picture Object that holds a visiblePicture, which can be properly displayed by projecting it on a screenwith a slide projector. A Picture's data may reside as electricalcharges distributed on the surface of a semiconductor imaging chip,requiring a sophisticated string of processes to buffer, decrypt,decompress, decode, convert and raster it before it can be observed on acomputer display.

[0251] From the preceding discussion, it may be properly concluded thatthere are two types of Picture Objects, Visible Picture Objects thatrecord a Picture's data as a directly viewable image, and VirtualPicture Objects that require a special device for creating an image ofthe recorded Picture.

[0252] Visible Picture Objects usually have relatively flat reflecting,transmitting or emanating surfaces displaying a Composition. Examplesinclude photographs, slides, drawings, paintings, etchings, engravingsand halftones. Visible Picture Objects are usually rectangular informat, although not necessarily. They are frequently very thin. Wecommonly call these Picture Objects “Pictures”. One characteristic ofVisible Picture Objects is that they store their Picture's informationas varying analog levels of an optically dense or reflecting mediumspatially distributed across an opaque or transparent supportingmaterial.

[0253] A Virtual Picture Object can only be observed when itsinformation is converted for display on a suitable display device. A FAXis a Virtual Picture Object that requires a FAX machine to create aviewable paper copy. A clip art file is a Virtual Picture Object thatrequires a computer equipped with a graphics card and monitor fordisplay.

[0254] Picture Streams

[0255] Picture Objects can be streamed to create the illusion of thesubject(s) of the Picture being in motion, hence the term “motionpicture”. To achieve the motion illusion, the individual Picture Objectsin the Stream contain highly spatially correlated Picture Compositions.In viewing a rapid succession of such streamed Picture Compositions, theviewer's eye and brain fuse the individual Picture Compositions into asingle Dynamic Composition, which the viewer's brain interprets assubject motion.

[0256] A reel of movie film is a Picture Stream (noun) consisting of asequence of individual frames. To stream (verb) the film's PictureObjects we use a movie projector. A VHS tape player streams VHS TapeCassettes, a DVD player streams DVD's or CD's, and desktop computersstream an ever growing variety of Picture Object Stream formats. AnInternet Streaming Video is a Picture Stream that can only be viewedwhen its information is processed by a computer before being displayedon a monitor.

[0257] VHS video tape stores a Sequence of Pictures whose information islinearly distributed as analog magnetic density levels distributedpiecewise linearly along a Mylar tape. When scanned by a magnetic pickupand converted to amplified electrical signals, a sequence of videoframes can be displayed on a cathode ray tube for viewing.

THE PREFERRED EMBODIMENTS

[0258] Broad Overview

[0259] The Visual Key Database in this embodiment is preferably softwarethat executes on a general-purpose computer and performs the operationsas described herein (the program). Pictures are entered into the programin digital form if they are not originally in digital form. Picturedigitization may be performed by any suitable camera, scanning device ordigital converter. In general, color scanning is employed throughoutthis disclosure, but the present invention should not be construed to belimited to the identification of images made in: the visible colorspectrum. Indeed, the present invention is operative when the imagesobtained are derived from infrared, thermal, x-ray, ultrasound, andvarious other sources.

[0260] Nor should the present invention be construed to be limited tostatic pictures. By rapidly sequencing multiple Pictures, motionpictures and video technologies produce the illusion of motion eventhough individual frames of the sequence are static. The presentinvention, by its very nature, has immediate application in identifyingthe movie or video source of a single frame or a brief snippet of theframe sequence from a database containing a multitude of movies andvideos.

[0261] Although the invention is presented here as applied to Picturesthat are two-dimensional in nature, there is nothing in the presentationwhich would not allow it to be extended into lower or higher dimensionsas required for applications such as Audio Analysis, Computer AssistedTomography (CAT Scanned images), Ultrasonic Tomography (UltrasoundScanned images), Positron Emission Tomograph (PET Scanned images) andMagnetic Resonance Imaging (MRI Scanned images).

[0262] The operation of the Visual Key Database consists of two phases,a learning phase and a query phase. Learning a new Picture is amulti-step process. The submitted Picture is converted into a DigitalImage and entered into the program. The program creates new databaseobjects for the new Picture and places them in the appropriate databasecollections. The new database objects are linked together andcollectively represent the newly submitted Picture. The program analysesthe Digital Image and places measurements obtained from the analysisinto one of the newly created database objects called a Visual KeyVector. It then computes a special binary code called an Index Key fromthe analysis results and records it in the Visual Key Database object.Finally, the program places all of the Picture's other relevant datainto the other appropriate new objects.

[0263] The database can be queried if it contains at least one picture.Pictures are selected from the database by matching the selectioncriteria specified in the query to objects in the database. When a querycontains a Digital Image amongst its query arguments, the programanalyzes the Digital Image and constructs a Visual Key and an Index Key.It then locates a matching Index Key if it is present and determines howwell the Visual Keys match. If a matching Index Key is not found, or ifthe Visual Keys do not match sufficiently well, the program constructsanother Index Key statistically closest to the first and tries again.Visual Keys of Pictures in the database that match the Query Picture'sVisual Key sufficiently well are then further selected by the otherspecified selection criteria in the query.

[0264] A very important feature of the present invention is that theDigital Image of the Picture submitted in the query need not beidentical to the Digital Image of the Picture that was learned in orderfor them to be matched. The only requirement is that both the learnedDigital Image and the query Digital Image be of the same Picture, or avery close facsimile thereof.

[0265] The learned and query Digital Images can differ in many respects,including image file size, spatial resolution, color resolution (bitsper pixel), number of colors, focus, blur, sharpness, colorequalization, color correction, coloration, distortion, format (bitmap,jpeg, gif, etc.), degree of image compression (for jpeg and mpegimages), additive noise, spatial distortion and image cropping. Thedegree to which a query Digital Image and a learned Digital Image candiffer and still be matched by the methods described in this inventionis largely a function of how many Pictures are in the Visual KeyDatabase and the degree of similarity of the Pictures with each other.The greater the differences between the individual Pictures representedin the database, the greater will be the tolerance for Digital Imagedifferences in the matching process.

[0266] A Visual Key Vector derived from a query Digital image will notalways perfectly match the Visual Key Vector in the database for otherreasons generally connected to differences among devices which are usedto acquire and digitize the images. Considering the device issue,differences will exist between images of the same picture if they areacquired by, respectively, a flatbed scanner and a digital video camera.It is also true that differences generally will exist between two imagesof the same picture taken at different times, due to imager systemnoise, variations in picture illumination, etc. Differences often existbetween images of the same picture acquired from differentrepresentations of the picture (the original Mona Lisa vs. a copy;images of a given page of a magazine acquired from different copies ofthe magazine, etc.).

[0267] Visual Key Database

[0268] A primary purpose of this invention is to automatically connect aPicture with its Content, Context and Representation. We call thisAutomatic Picture Identification.

[0269] Another purpose of this invention is to enable a databasecontaining Picture Contents, Contexts and Representations to be searchedby Queries constructed not only of words, numbers, dates, dollars, etc.,but also by Pictures.

[0270] A principle objective of this invention is to achieve itspurposes without requiring the database to store a copy of a Picture'sRepresentation. Although the database may contain Representations of allor some of its Pictures, the Representations are not employed inachieving the invention's purpose. Rather, the Representation isemployed primarily as a means of visually confirming that a Picture hasbeen correctly identified.

[0271] The invention presupposes that a given Picture may multiplyappear in the database. Therefore another purpose of the database is topermit a query to retrieve all the Contents and Contexts of a givenPicture.

[0272] A primary application of this invention is to automaticallyassociate a Picture with a Picture Object Stream. Another primaryapplication of this invention is to automatically associate a shortsequence of streamed Pictures with its parent Picture Object Stream Wecall the database described above a Visual Key Database.

[0273] Visual Key Database Description

[0274] A Visual Key Database usually contains four Collections ofobjects: Visual Key Vectors, Contents, Contexts and Representations.Additionally, a Visual Key Database may contain a fifth Collection ofPicture Stream Objects. A Visual Key Database uses its Visual KeyVectors to identify Pictures and make their Contents, Contexts,Representations and parent Picture Streams available. A Visual KeyDatabase programmatically selects a Visual Key Vector from itsCollection of Visual Key Vectors by analyzing a Picture submitted to itas a Digital Image. The selected Visual Key Vector then identifies theappropriate Content, Context and Representation for the submittedPicture.

[0275] A Content Object includes the details of a Picture's Content asdata. A Content Object also includes methods to store and retrieveeither individual data items or predefined Content descriptors thatcombine individual data items. Similarly, a Context Object includes thedetails of a Picture's Context as data, and methods to store andretrieve individual data items and Context descriptors combiningindividual data items. Picture Stream Objects include an OrderedCollection of Picture Objects, which constitute the elements of thePicture Stream. Picture Stream Objects include the details describing aPicture Stream which are not included in the Content and Context Objectsof the individual Picture Objects in the Stream.

[0276] An Index Key is an alphanumeric string that identifies a VisualKey Vector for purposes of locating and retrieving it from the database.An Index Key is often, but not necessarily, unique. A Visual Key Vectoris a set of measurements analyzed from the Digital Image of a Picture.

[0277] Objects in the database can be linked to each other in many ways,eliminating the need for duplication of identical objects. For example,a single Picture may have many different Contexts if it has beenpublished in many different venues. Several Pictures Objects, each beinga different version of the same underlying Picture, may have the sameContent, but different Contexts.

[0278] Visual Key Database Operation

[0279] Pictures are entered into a Visual Key Database by:

[0280] 1. Entering a Digital Image of the Picture,

[0281] 2. Computing a Visual Key Vector and an Index Key for the DigitalImage,

[0282] 3. Entering the Picture's Content data in a new Content Object,

[0283] 4. Entering the Picture's Context data in a new Context Object,

[0284] 5. Entering the Picture's Representation in a new RepresentationObject,

[0285] 6. Linking the new Visual Key Vector, Content, Context andRepresentation,

[0286] 7. Adding the new Visual Key Vector, Content, Context andRepresentation to the Visual Key Database.

[0287] Entering a Picture's Content, Context and Representation can bedone manually by the user, automatically by an application supplied bythe user, or a combination of the two. For example, the user may employan Image Understanding program, such as one marketed by Virage, Inc., toautomatically generate Content data which may then be stored in theVisual Key Database Content Object. The user may employ a Content orContext description from another database. Some Context data may bedirectly obtainable from the Picture Object, such as file headers fordigital image files or SMPTV codes on individual video frames. PictureRepresentations may be supplied by the user or extracted directly fromthe Picture's Digital Image.

[0288] Once Pictures are entered into a Visual Key Database, it can bequeried. The Visual Key Database is Queried with a Picture by:

[0289] 1. Entering the Digital Image of a Picture,

[0290] 2. Computing a Visual Key Vector for the Digital Image,

[0291] 3. Entering a Minimum Acceptable Match Score,

[0292] 4. Computing the most probable Index Key,

[0293] 5. Locating the Index Key in the Collection of Visual KeyVectors, and, if absent, returning to Step 3,

[0294] 6. Computing a Match Score comparing the Visual Key Vector (fromStep 2) to the Visual Key Vector contained in the Visual Key identifiedby the Index Key,

[0295] 7. Returning to Step 3 if the Match Score from Step 6 is lessthan the Minimum Allowable Match Score, and

[0296] 8. Answering the Content, Context and Representation linked bythe Visual Key identified by the Index Key.

[0297] It should be noted that the computing of the most probable IndexKey at Step 4 will necessarily yield an Index Key that has not beenpreviously computed, unless the database contains another copy of theprevious Index Key, in which case Step 4 will return the previous IndexKey.

[0298] The Match Score is a number between 0 and 100 that indicates howgood a Visual Key Vector match is, 100 being a perfect match. Also notethat each iteration begins with step 3 rather than step 4, allowing theMinimum Acceptable Match Score to be increased as the Visual KeyDatabase is searched deeper and deeper for an acceptable match.

[0299] Entering Picture Objects into a Visual Key Database

[0300] The following paragraphs are an elaboration of the stepspreviously outlined, detailing the construction of a Visual KeyDatabase. This section follows the flowchart in FIG. 1, whichillustrates the steps involved in entering new Pictures into the VisualKey Database 100.

[0301] The first step in the process is to establish a DO loop to runthrough all of the pictures to be loaded 101. If the Picture is notalready in digital form, it is digitized at 102. The Picture Object maybe a paper photograph or a single video frame recorded on a VHS tapecassette. Many techniques exist for converting the Picture Object'sPicture data into a Digital Image. Many more techniques exist formanipulating the Digital Image after the picture has been digitized. Aprimary purpose of the present invention is to be capable of matchingthe learned Picture even after its image information has undergonemultiple levels of copying, reformatting, compression, encrypting andimage manipulation. If the Picture is originally in digital form, thisstep is skipped.

[0302] The next step generates a Visual Key Vector from the Picture'sDigital Image 300. A Visual Key Vector is an ordered sequence ofcomputer bytes created by a Visual Key Algorithm with pixel data sampledfrom the Digital Image. Some of the bytes of a Visual Key Vector arefunctions of particular regions of the Digital Image. Other bytes of theVisual Key Vector may be based on global image characteristics of theDigital Image. The steps involved in performing the Visual Key Algorithmare illustrated in FIG. 3. The next step (in FIG. 1) involves theselection of the most relevant elements of the Visual Key Vector (V) forstorage (as VS) 103. Criteria for selection might be element magnitude(to optimize signal-to-noise ratio) or location of vector originsrelative to the image (to maximize independence of vectors or to assureuniform distribution of origins over image space).

[0303] Next, an Index Key (K) must be generated 104. This isaccomplished by sampling and quantizing V. The process of computing theIndex Key from the Visual Key Vector is explained in the sectionentitled The Index Key below.

[0304] Once an Index Key has been generated, all of the related piecescan be stored at this index (K) in the database 105. This includes theStored Visual Key Vector (VS) and its associated Picture Content,Context and Representation. This step really combines several relatedoperations, as follows:

[0305] a) Optionally entering the Picture's Content data in a newContent Object. As previously described, the Picture's Content data mayinclude subject, subject name, subject type, subject activity, subjectrelationships, subject descriptors, keywords, location and setting.Additional user defined Content descriptors can be supplied.

[0306] b) Optionally entering the Picture's Context data in a newContext Object. As previously described, the Picture's Context data mayinclude date, title, owner, artist, copyright, rating, producer,session, roll, frame, material, media, storage location, size,proportions and condition. Additional user defined Context descriptorscan be supplied.

[0307] c) Optionally entering the Picture's Representation in a newRepresentation Object. As previously described, the Picture'sRepresentation is a visually identifiable icon for a Picture.

[0308] d) Lining the new Visual Key Vector, Content, Context andRepresentation Objects.

[0309] e) Adding the new Visual Key Vector, Content, Context andRepresentation to the Visual Key Database at its Index Key. The databaseis then preferably ordered in ascending order of the index keys.

[0310] Once this process of loading has been completed for the Picturesat hand, the DO loop is ended 106. Of course, additional Pictures addedto the Visual Key Database at any time by repeating these steps asnecessary.

[0311] Querying the Visual Key Database

[0312] This section goes into greater detail on the process of Queryingthe Visual Key Database; it is an elaboration of the steps previouslyoutlined.

[0313] Once Pictures have been learned, the Visual Key Database can besearched by presenting a query in terms of a Picture and/or auxiliaryinformation related to that Picture. As with other databases, selectioncriteria may include matching text values, selecting non-negativenumerical values or finding a range of dates. In addition, the presentinvention adds the feature that selection criteria may include choosingall Picture Objects whose Pictures match the Query Picture. Thetechniques for database querying for all data types other than Picturesare well known and will not be discussed here. Rather, we will focus onthe activity of selecting records from a Visual Key Database bypresenting Queries that include Digital Images.

[0314] Examples of Visual Key Database Queries include:

[0315] Select all black and white photographs that match the QueryPicture with a certainty of 90%.

[0316] Select the Picture Object that best matches the Query Picture.

[0317] Select all magazine advertisements from the period 1950 to 1960that match the Query Picture with a certainty of 70%.

[0318] Select the frame from the movie “Gone With The Wind” that bestmatches the Query Picture.

[0319] Obviously, the above list could be extended indefinitely. Theimportant point is that the present invention permits database queryingto be expanded to data types that make searches possible that previouslywere impossible.

[0320] The flowchart in FIG. 2. illustrates the steps involved inselecting Picture Objects from a Visual Key Database using a Picture asthe Query 200. First, the Picture is received as a query (Q) 201 bydigitizing it to a Digital Image 202. This step is skipped if thePicture is already in the form of a Digital Image.

[0321] Next, a Visual Key Vector, V_(Q), is generated for the DigitalImage of the Query Picture 300. This process is illustrated in, FIG. 3.Up through this point, the steps are the same as in the process ofloading Pictures into the database.

[0322] In preparation, for finding the best match to the Query Picturein the database, we must construct the Query Picture's TumblerProbabilities 2400. This is identical to the Index Key produced whenloading the database, and will be used to compare with the Index Keys inthe database to narrow the search. This process is illustrated in FIG.24.

[0323] In order to decide which Index Keys should be searched, aSquorger tree is constructed 2500. The Squorger methodology, which willbe described in detail later, provides a mechanism through which IndexKeys can be extracted in order of statistical proximity to the QueryPicture's Index Key. The first Index Key to be searched is the one thatis identical to the Query Picture's Tumbler Probabilities, whichobviously provides a perfect match to itself The process of constructingthe Squorger tree is illustrated in FIG. 25, and is discussed in the:section entitled Recursion flowchart below.

[0324] At each probe of the database, it extracts the next candidateIndex Key (K_(p)) from the Squorger 2700. The very first Index Keyextracted will match the Query Picture's Tumbler Probabilities exactly.As subsequent probes are made, the Index Key extracted may be fartherand farther from the Query Picture's Tumbler Probabilities. This processis illustrated in FIG. 27, and is discussed in the section entitledDetail of Squorger next Method.

[0325] Next, the database is queried to determine whether it contains anIndex Key that matches the current Index Key (K_(p)) pulled from theSquorger 203. If no match is found 204, a new Index Key is pulled fromthe squorger and another comparison is made (provided we have notdecided that we've looked far enough 208). If a match to K_(p) is found,all of the Visual Key Vectors at that Index Key must be compared againstthe Query Picture's Visual Key Vector to produce a match score 205.

[0326] If the closest of these matches is greater than the minimalacceptable match score, then we've found the best match to the QueryPicture from the Visual Key Database 207. If not, we have to decidewhether we have looked sufficiently to be satisfied that it is notcontained within the database 208. If not, we'll ask the Squorger forthe next most likely Index Key and repeat the process 2700. If we havesearched enough to be satisfied, we report that it was not found 209.This cycle is repeated until a match is found, in which case we proceedto the next step in the algorithm.

[0327] Although the algorithm is shown to be specific in the criteriafor a match, an infinite variety of acceptance criteria could beincorporated into the algorithm (Find the three best matches; find thefirst five matches all of which have a match score less than x; etc.).

[0328] Visual Key Generation Algorithm

[0329] If the Digital Image of the Query Picture were always identicalto the Digital Image of the Matching Picture, then the process ofpicture matching would be reduced to Digital Image pixel matching, or,as it is called in image processing, template matching. However, in allpractical circumstances, pixel matching fails because Digital Imageswhich are very similar in appearance can have very differentcorresponding pixel values. Local variations in the Digital Image due toartifacts of decompression, additive noise, image distortion, imagescaling, focus, color depth, etc. can render template matchingcompletely useless, even though, to an observer, the Digital Imagesclearly are of the same Picture.

[0330] For this reason and for reasons to be explained, the concept of aVisual Key Vector of a Digital Image of a Picture is introduced. AVisual Key Vector of a Digital Image typically contains two kinds ofinformation, global information and local information. Globalinformation can consist of classical image measures like colorhistograms and discrete cosine transform coefficients. Local informationis contained in the results of applying a Warp Algorithm to the DigitalImage. In practice, satisfactory performance of the Visual Key Databasesystem can be realized by computing the Warp Grid Vector 300 alonewithout the global attributes 305. The decision as to whether to add theglobal attributes is left to the user to be based upon the level ofperformance desired. This Warp Grid Vector portion of the Visual KeyVector characterizes the Digital Image in a unique way that is notdirectly tied to specific pixel values. Instead, it uses therelationships between the pixel values across the whole Digital Image torecognize its general visual structure. This then becomes a “signature”or “fingerprint” of the Picture, which survives most variations due toprocessing artifacts and causes previously mentioned.

[0331] Constructing the Visual Key Vector, then, consists of combiningthe global values to be used with the local values (Warp Grid Vectors)all into a single vector. Here we'll go through the flowchart of thisprocess, shown in FIG. 3. To compute the Visual Key Vector 300, we startwith an empty vector (V) 301. A DO loop is set up to go through each ofthe attributes for which we will generate a Warp Grid Vector 302. Theprocess of computing a Warp Grid Vector for a given attribute 900 isillustrated in FIG. 9 and explained in the section entitled Computingthe Warp Grid Vector, found on page 73 of this document. This Warp GridVector is then appended to V 303, until all of the Warp Grid Vectors areincluded 304 in the new vector (V).

[0332] Next we must append the global attributes to V. A DO loop is setup to go through each of the global attributes to be included 305. Foreach of these attributes, we'll do whatever is required to compute theattribute 306. As mentioned previously, these could be any overallattributes of the Digital Image, including classical image measures likecolor histograms and discrete cosine transform coefficients. The vectorthus produced is then appended to V 307, until all of global imageattribute vectors are included in the new vector (V) 308, which is thenreturned as the Visual Key Vector 309.

[0333] Warp Grid Adaptation Examples

[0334] During the following explanations of the Warp Grid Algorithm wewill make use of examples based on the Picture on the front of a 1985Topps Mark McGwire Rookie Baseball Card. This Picture example is chosenbecause the Picture Object has recently enjoyed a substantial rise inits value, and there is a peaked interest in recognizing it fromthousands of other cards.

[0335]FIG. 4 is a black and white representation of the Picture on thecard's front side. The Representation is a black and white version of afull color Digital Image, which is 354-by-512 pixels and 24 bits incolor depth. The card borders are white, hence they do not contrastagainst the white, paper background of the card illustration.

[0336]FIG. 5 is a black and white representation of the red channel onlyof the Digital Image represented in FIG. 4. Red pixel brightness valuesrange from 0 to 255, represented here as grey values ranging from blackto white respectively.

[0337] The Warp Algorithm

[0338] Rather than analyzing the Digital Image in terms of specificpixel values, the Warp Algorithm recognizes broader patterns of color,shape and texture, much as the human eye perceives the Picture itself.Now we'll look in detail at how the Warp Grid Vector is derived byapplying the Warp Algorithm to a Digital Image. The reason it is calleda Warp Algorithm will soon become apparent. Note that the Digital Imageitself is never changed in the process of applying the Warp Algorithm tothe Digital Image.

[0339] An Initialized Warp Grid is an M row-by-N column grid of pointscontained within a rectangle 1 unit on a side centered at and orientedto a Cartesian Coordinate System, with coordinates u and v. The gridpoints of an Initialized Warp Grid are preferably evenly spaced over itsbounding rectangle. This Initialized Warp Grid is superimposed upon theDigital Image, in preparation for adapting it to the pictorial contentof the Digital Image. FIG. 6 illustrates a 16-by-24 Warp Grid plotted inthe UV Coordinate System. All Grid Points are uniformly spaced andcentered within a 1-by-1 rectangle, illustrated here by the solid borderaround the grid. Although the Grid Points are represented here byrectangular arrays of black pixels, the actual Grid Point is a pair offloating point numbers, (u,v).

[0340]FIG. 7 represents the initialized 16-by-24 Warp Grid of FIG. 6superimposed on the red channel of the Digital Image. In this case, theWarp Grid is superimposed on the Digital Image by matching theirborders. The points of the Warp Grid are illustrated by rectangularblack dots, enlarged to 4-by-5 pixels for easy visibility. Note that theborder at the top and left edges of the card are an artifact of theprocess used to capture the image for publication.

[0341] Computing the Warp Grid Vector

[0342] Referring to FIG. 9, we go through the process of computing theWarp Grid Vector 900. First we must determine the Warp Grid bounds onthe image in terms of xy space 901. Commonly this will, be a rectanglethat corresponds to the bounds of the Digital Image itself; however, theInitialized Warp Grid need not be uniformly spread over the DigitalImage. It may occupy just a portion of the image, or the points in theInitialized Warp Grid may be non-uniformly spaced. The rectangular shapeof the Warp Grid may be distorted in the process of superimposing it onthe Digital Image. Extending the permissible geometries of the regionsin the Digital Image to which the Warp Grid is applied to include anybounding quadrilateral, not just rectangles, allows the Warp Grid to bemuch more flexible. This feature is particularly useful when the DigitalImage of a rectangular Picture Object like a picture post card isobtained from a camera that is not positioned on a perpendicularcentered on the Picture Object, thus yielding a perspective distortedrectangle. The image of the perspective distorted Post Card is ingeneral a quadrilateral contained in the Digital Image. Given thepositions of the four corners of the quadrilateral, the rectangular WarpGrid can be rotated and stretched to fit the Post Card imaged geometry.A quadrilateral superimposition of a rectangular Warp Grid isillustrated in FIG. 8.

[0343] Additionally, grid lattice geometries other than the rectangulargrid may be used, such as hexagonal, spiral, radial, etc. Thisdisclosure will focus exclusively on the rectangular grid, but it willbe apparent to one skilled in the art that equivalent results can beobtained with grids of points generated in other geometries.

[0344] In general, the number of points in the Warp Grid is considerablyless than the number of pixels in the Digital Image. Each Warp GridPoint specifies the location of a single pixel memory access of thedigital image at each iteration of the Warp Grid Algorithm. Therefore,the total number of pixel memory accesses performed in the WarpAlgorithm is typically less than the total number of pixels in a digitalimage, and frequently much less.

[0345] Warp Grids of more or fewer points may be employed as determinedby the desired performance and size of the Visual Key Databaseimplementation. In general, the greater the number of Warp Grid points,the greater will be the sensitivity of the Visual Key Database to thedetails of the Composition of the Pictures it contains.

[0346] The next step is to initialize the points on the Warp Grid 902.Points in the Warp Grid (Grid Points) are indexed as the m^(th) columnand n^(th) row, starting with 0 at the upper left-hand corner of thegrid. This index represents the identity of each Grid Point, and doesnot change no matter how much the location may change. Each Grid Pointkeeps track of its location by recording its u and v coordinates. EachGrid Point also records its level, which will be discussed shortly. Uponinitialization of the Warp Grid, startingPoints and currentPoints areboth set to the initial collection of Grid Points 902. startingPointsremains unaltered, and represents the record of the original location ofthe Grid Points. currentPoints is the set of Grid Points that isactually adapted with each iteration of the Warp Algorithm.

[0347] With the Warp Grid fully initialized, we begin the iterativeprocess of adapting it to the Digital Image. Each iteration moves eachof the currentPoints to a new location based on the sampled values atits current location as well as the motion of its neighbors 1000. Thisprocess is illustrated in FIG. 10 and is explained in the sectionAdapting the Warp Grid.

[0348] After each iteration of the process, we must decide whether theWarp Grid has been adapted sufficiently to fully characterize thePicture 904. If not, we must adapt it another step. If it has beenadapted sufficiently, we have enough information to create the Warp GridVector, simply by taking the difference between each of thecurrentPoints and their corresponding startingPoints 905. Each of thesevalues (typically a floating point number) becomes one element of theWarp Grid Vector, which is then returned 906.

[0349] How do we decide when the Warp Grid is fully adapted to theDigital Image? This can be done in a couple of ways. We can decide on afixed number of iterations, decrement a counter each time the Warp Gridis adapted, then simply stop when the counter has been decremented to 0.The number of iterations used would be chosen based on experiments withlarge numbers of representative images. We can also make use of thebehavior of the Warp Grid points themselves. In order to do that, wemust take a, closer look at the behavior of Warp Grids and the factorsthat alter that behavior.

[0350] The overall process of sampling the Digital Image and offsettingthe currentPoints in the direction of the center-of-gravity of each gridpoint's Connectivity Pattern deforms the Warp Grid at each iteration.This is why the process is called a “Warp” Algorithm. In general, eachsuccessive iteration of the Warp Algorithm deforms the grid further andfurther until a near equilibrium is reached, alter which the points movevery little or very slowly or not at all.

[0351] In other words, the Warp, Algorithm does not deform the Warp Gridindefinitely. After a suitably large number of iterations, typicallyfewer than 100, an equilibrium condition occurs where the grid pointswill displace no further with additional Warp Algorithm iterations. Manygrid points sit perfectly still, but some points will irregularlyoscillate with relatively small movement around an equilibrium position.Grid Points in equilibrium self organize spatially over a Digital Image.A Grid Point finds itself in equilibrium when the tensions exerted byall its connecting Grid Points balance and cancel each other out. A WarpGrid achieves equilibrium when all its Grid Points have achievedequilibrium.

[0352] The equilibrium configuration does not depend on individual pixelvalues in a Digital Image. Rather, it depends only on the patterns ofcolor, shape, texture, etc. that comprise the Composition of the Pictureand its Digital Image. As the Warp Grid adapts to a Digital Image, eachGrid Point walks a path determined by the n points in its ConnectivityPattern, guided like a blind man by n hungry seeing eye dogs, leashedand pulling him in n directions, until finally their pulls balance andhe is held stationary. As the Warp Grid adapts, the footprint of the ngrid points composing a given Connection Pattern adapts itself from itsinitial pattern to a new freeform configuration, which conforms to theComposition of the Digital Image in a particular region.

[0353] If two different Digital Images are prepared from the samePicture, perhaps differing in resolution and image artifacts introducedby the method of compression, e.g. gif or jpeg, the equilibriumconfiguration of the adapted Warp Grid for each Digital Image will bethe same or very nearly so.

[0354] The Equilibrium Configuration of a given Warp Grid is primarilydetermined by the Composition of the Picture represented in the DigitalImage. However, it can also be affected by the Neighborhood Radius (NR),a constant that are used in the Warp Algorithm. Another such constant,Warp Rate (WR) does not have a significant effect on the EquilibriumConfiguration (see FIG. 14).

[0355] WR globally alters how far each of the currentPoints moves ateach iteration; NR determines which of its immediate neighbors exert adirect influence on its motion. Both of these concepts will be exploredin depth later, but it is important to note that there are settings ofthese constants that will cause the Warp Grid never to reach a stableequilibrium. For example, if WR is too high, points may jump past theirpoint of equilibrium in one step, and jump back past it in the next,i.e., they will oscillate. In general, values of WR<1 will permit anequilibrium configuration to be reached.

[0356] The WR can be changed at each iteration of the Warp Algorithm.Reducing the WR as equilibrium is reached produces a more stable andrepeatable equilibrium configuration. This process is termed “SyntheticAnnealing” in some image processing texts.

[0357] Rather than depending upon a fixed number of iterations, the testfor determining when to end the adaptation process could be based on howclose the warp grid has come to its equilibrium configuration. Onepossible measure for determining how far towards equilibrium theadaptation has come is simply the total of all the individual grid pointoffset magnitudes in a single iteration. As equilibrium is approached,the total offset magnitude at each iteration approached zero.

[0358] The graph in FIG. 11 illustrates the number of adaptation processsteps to reach equilibrium for different Warp Rates. Magnitude is themagnitude of the Warp Grid Vector, the vector whose elements are theindividual Warp Grid Displacement Vectors. It increases monotonicallyuntil equilibrium is reached, thereafter fluctuating around anequilibrium value. As can be seen from the graph, 250 iterations aresufficient to reach equilibrium for all of the cases illustrated.

[0359] Warp Grids characterize digital images in an extremely efficientmanner, both in their construction and their storage. Warp Grids areadapted by sampling the digital image, and, in general, the number ofsamples required to adapt a Warp Grid is significantly less than thetotal number of pixels in the digital image. Warp Grids characterizedigital images only insofar as to allow them to be distinguished fromone another. Digital images cannot be recovered from Warp Grid data,i.e., Warp Grids are not a form of digital image compression.

[0360] Adapting the Warp Grid

[0361] Connectivity Patterns

[0362] In order to understand the process of adapting the Warp Grid tothe Digital Image, we must understand the concept of a ConnectivityPattern. A Warp Grid Connectivity Pattern determines how Warp Gridpoints connect to each other and how the movement of their neighborsaffects their individual movements. Each given point in the Warp Grid isdirectly connected to a finite set of other points in the grid, known asthe Connectivity Pattern. An Initialized Warp Grid is completelycharacterized by the locations of all its points and its ConnectivityPattern.

[0363]FIG. 12 illustrates three possible Connectivity Patterns for theInitialized Warp Grid illustrated in FIG. 6. The Connectivity Patternsrepresented here are called Neighborhood Configurations. TheNeighborhood. Configuration consists of a central point surrounded bysquare layers of surrounding points. A Neighborhood Configuration isdefined by its Neighborhood Radius (NR), which is the number of layerssurrounding the central point. The lines connecting the central point toits surrounding points symbolize the dependency of the central point onits neighbors.

[0364] At each iteration of the Warp Algorithm, the positions of all theWarp Grid points 5 are modified based on sampled, pixel values in theDigital Image. Although Warp Grid points move, their ConnectivityPattern never changes. Points in the Warp Grid remain connected to thepoints in their respective Connectivity Pattern regardless of thepositions of those points in the u,v space.

[0365] The Connectivity Pattern is homogenous over the entire Warp Grid.Every point has the same configuration of connections, even though itmay lie at or near an edge of the Warp Grid. Points that lie along theedges of the grid are connected to grid points on the opposite side ofthe grid, i.e., points along the top of the grid connect to points alongthe bottom of the grid, points along the left edge of the grid connectto points along the right edge of the grid. In terms of the Warp Gridpoint indices m and n, the indices are computed as m mod M and n mod N,where M and N are the dimensions of the Warp Grid.

[0366] In general, both the Digital Image and the Warp Grid are treatedas if they were totally wrapped around a toroidal surface with oppositeedges joined and the surfaces made'seamless and homogeneous. Toroidallywrapping the Digital Image and the Warp Grid in this manner eliminatesthe concerns of edge effects in the calculation of the Warp Grid.

[0367] Two representations of the toroidal wrapping of the NeighborhoodPoints are illustrated in FIG. 13. The Grid Point whose neighborhood isdisplayed is circled in both figures. Although this Grid Point islocated in the upper right corner of the Picture, it is directlyconnected to Grid Points on all four corners. This Grid Point is treatedas though is in the center of the Picture in terms of its relationshipwith all other Grid Points on the Picture.

[0368] Although the Equilibrium Configuration is relatively independentof the Warp Rate, it is definitely affected by the Connectivity Pattern.FIG. 15 illustrates the effect of the Connectivity Pattern on theresulting Equilibrium Configuration, showing three different EquilibriumConfigurations arising from three different Connectivity Patterns.Although the effect is not drastic, the larger the Connectivity Pattern,the greater the influence of the large, bright regions in the picture.This is best seen in this illustration by examining the head of the batin the picture. As the Neighborhood Radius increases, the number of WarpGrid points attracted to the bat decreases, as they are drawn furthertowards the face and neck of the player.

[0369] Sampling the Digital Image

[0370] The Warp Grid rectangle is superimposed on the bounding rectangleof the Digital Image. Each superimposed Warp Grid point falls within thelocal boundaries of a unique pixel in the Digital Image. The Warp GridPoint samples the pixel, that is, it takes its level instance variablefrom the sampled pixel. The sampled level may, in general, be any singlevalued function of one or more variables measurable in a pixel in theDigital Image. For example, if the Digital Image is a grayscale DigitalImage, then the sampled level can be the gray level of the pixel thatthe grid point falls in. If the Digital Image is a full color DigitalImage, then the sampled value could be the level of the red component ofthe pixel containing the sampling point, the green or blue component, ora combination of one or more color components such as the hue,saturation or lightness of the pixel containing the sampling point. Thesampled levels of all points in the Warp Grid are determined prior tothe next step in a Warp Grid iteration.

[0371] Although the quantity sampled at a sampling point in a DigitalImage is typically the level of a color attribute of a pixel, thepresent invention should not be restrictively viewed as only pertainingto color. For example, pixel values could represent temperature,emissivity, density or other quantities or combination of quantities,any of which could be arranged spatially in a Digital Image format.Though they may be color coded for enhanced visualization, they are not;in any way directly connected to color values.

[0372] Adapting the Warp Grid a Single Step

[0373] The Warp Grid is spatially adapted to the Digital Image. Eachgiven point of the grid is displaced in the u,v Coordinate System fromits current position by an offset vector determined from the sampledvalues and positions of all grid points that belong to the ConnectionPattern of the given grid point. Every point in the Warp Grid issimultaneously displaced in accordance with the offset calculationdescribed in the following paragraphs.

[0374] In the most basic of the methods for computing the offset vectorto be applied to a given Grid Point in the Warp Algorithm spatialadaptation step, the offset vector is calculated from the positions andsampled values of all the Grid Points in the Connection Pattern of thegiven Grid Point. In particular, the offset vector for the given GridPoint is calculated as a scaling of the position of thecenter-of-gravity of all of the points in the Connection Pattern of thegiven point relative to the position of the given point. In thecenter-of-gravity calculation, the individual Connection Pattern GridPoints are weighted by their level obtained from the previous step ofsampling the Digital Image at all Grid Point positions.

[0375] Mathematically, if p₀ denotes the position of a given point in aWarp Grid measured in the u,v coordinate system and {C₀} denotes a setof C points which constitute the Connectivity Pattern of p₀, includingp₀ itself, then the center of gravity of the Connectivity Patternp_({C0)}^(CG) is given by:$p_{{\{ C_{0}\}}^{CG}} = \frac{\,_{p{\{ C_{0}\}}}\left\lbrack {{L(p)}p} \right\rbrack}{\,_{p{\{ C_{0}\}}}\left\lbrack {L(p)} \right\rbrack}$

[0376] where L(p) is the sampled level of the Digital Image at the pointp.

[0377] The offset to be applied in displacing the point p₀ is calculatedfrom the center-of-gravity p_(55 C0}) ^(CG) as

p ₀ ^(offset) =WR(p _({c) ₀ _(}) ^(CG) p ₀)

[0378] A corresponding new point, p₀ ^(new) in the succeeding iterationis calculated from the preceding point p₀ and the center of gravityp_({C0)}^(CG). The displacement coefficient (Warp Rate) WR is a numbergenerally less than one that is held constant over all points in theWarp Grid at a given iteration of the adaptation step. In particular,the new point p₀ ^(new) is calculated as:

p ₀ ^(new) =p ₀ +p ₀ ^(offset)

[0379] For a value of WR equals 1, at each iteration of the Warp GridAlgorithm, each Warp Grid Point is displaced to the position of thecenter-of-gravity of its Connection Pattern, where the connecting pointsare weighted by their values taken from the Digital Image. For values ofWR less than 1, the grid points are adapted in the direction of thecenter-of-gravity a distance proportional to WR. A connecting point thusinfluences the repositioning of a given grid point in proportion to theproduct of its level and its distance from the given grid point.

[0380] Interestingly, the WR does not necessarily have a large effect onthe final Warp Grid, provided it has gone through enough iterations toreach its equilibrium point. In FIG. 14, we see examples of twodifferent WR settings (0.1 and 0.5) on the same Warp Grid after 250iterations.

[0381] The WR can be used very effectively to accelerate the process ofbringing the Warp Grid to its Equilibrium Point and improve thestability of that equilibrium. This is accomplished by reducing the WRas the Grid Points approach their Equilibrium Point. As the change inposition between steps decreases, the WR is also reduced. Thus we canuse a large WR at first to advance the Grid Points boldly, then reduceit to settle in on the Equilibrium Point without overshooting oroscillating.

[0382] As previously discussed, the level taken by a given Grid Point isderived as a function of the attributes of the Digital Image pixelsampled at the given Grid Point position. The usual pixel attributes arethe intensities of the Digital Image's three color channels. The valueof a Grid Point is generally a floating point number in the range 0 to 1and may represent any function of its sampled pixel attributes. If, forexample, the value is selected to be the normalized intensity r of thered channel in the Digital Image (normalized to the interval 0 to 1),then the Warp Grid points will be seen to be attracted to the red areasof the Digital Image in the Warp process, the brightest red areas havingthe most attraction. If, on the other hand, the value is chosen to be1−r, then the points of the grid will be attracted to those areas of theDigital Image where red is at a minimum.

[0383] In computing the position of the center-of-gravity of theConnectivity Pattern of a given Grid Point p₀, either the actual levelsof all the Grid Points in the Connectivity Pattern may be used or thevalues may be taken relative to the level of the given Grid Point. Forexample, if L(p) denotes the level of a Grid Point, then a relativelevel for the Grid Point p in the Connectivity Pattern of p₀ could bethe absolute difference between the level at p and the level at p₀,i.e., |L(p)−L(p₀)|. In this case, supposing that the L(p) areproportional to the red channel intensities, the Warp Grid will be seento deflect locally in the direction of the strongest red channelcontrast, that is, an area of the Digital Image containing an edge orother abrupt change in the red component of the Picture. On the otherhand, if the Connectivity Pattern Grid Point levels are computed as1−|L(p)−L(p₀)|, then the Warp Grid will seen to be displacing locally inthe direction of uniformly red colored areas.

[0384] If the center-of-gravity of the Grid Point weightings arecomputed as L(p)−L(p₀), then only positive contrasts will attract GridPoints, while negative contrasts will repel them. Here, positivecontrast is defined as an increasing level L(p) in the direction ofpositive u and v.

[0385]FIG. 16 illustrates the initial configuration and the first twoiterations of the Warp Algorithm as applied to the Neighborhood centeredat row 9, column 10 of the initialized Warp Grid shown in FIG. 7. Theleft column of the figure illustrates the neighborhood superimposed on aportion of the Digital Image, while the column on the right illustratesthe levels of the Digital Image sampled at the positions of the:Neighborhood Points. At each iteration of the adaptation algorithm, thecenter-of-gravity of the neighborhood points, which are weighted bytheir sampled levels, is computed. The computed center-of-gravity forthe configurations in the column on the right are shown by the crosshairs. The Warp Rate in this illustration has been set to 1 so that newgrid points are displaced to the position of the center-of-gravity oftheir Connection Pattern.

[0386] Although in the discussion of the steps of the Warp Algorithm theexample of the center-of-gravity of the Connectivity Pattern is usedthroughout, any function of the Connectivity Pattern Grid Pointpositions and levels can be used for computing the offsets in theadaptation step of the Warp Algorithm. For example, rather than thecenter-of-gravity, the offset vectors could be computed as beingproportional to the vector drawn from a given Grid Point to the GridPoint in its Connectivity Pattern with the highest level. But not allfunctions will yield an equilibrium configuration of the Warp Grid.

[0387] In the preceding discussions, the Digital Image, Warp Grid andConnectivity Pattern are all taken as being two-dimensional. However,nothing in the preceding discussion would preclude the methods describedfrom being applied in one dimension or in three or higher dimensions.Indeed, the methods described herein would be extremely useful in theanalysis of three-dimensional Digital Images, which occur as thecomputed output of certain medical imaging systems.

[0388] Now we'll go through the flowchart in FIG. 10, which illustratesthe process of adapting the Warp Grid a single iteration 1000.

[0389] First we set up a DO loop on currentPoints 1001. For each pointCP_(i), we do a coordinate transform to translate its u,v location intox,y 1002. Then we store the sampled level at that x,y location on theDigital Image in L_(i) 1003.

[0390] When all the points have had their levels L_(i) sampled andstored, the DO loop is ended 1004 and we move on to adapting thecurrentPoints a single step. It should be noted that L at each pointcould be sampled as part of the following loop, in which the positionsof the points are actually adjusted 1008. The reason for not doing thisis one of optimization. By storing the levels for each point for theduration of each iteration, we only have to sample each point one time(for a total of M×N sampling steps). Thus we avoid having to resamplethese points each time they are accessed as part of evaluating theNeighborhood Points' effect on each point (for a total of M*N*(2*NR+1)²sampling steps).

[0391] With the L of each of the currentPoints stored, we once againsweep through all of the currentPoints with a DO loop 1005. First we setthe variables ZM and FM to their initial empty values 1006. ZM(Zero^(th) Moment) will be the sum of the levels of the NeighborhoodPoints; FM (First Moment) will be the sum of the levels of individualneighborhood points weighted by their distance from the given point,CP_(i).

[0392] Next we set up a DO loop on the Neighborhood Points (NP_(j)) ofthe current point CP_(i) 1007. The points that comprise NP_(j) are afunction of the Warp Grid's Connectivity Pattern, here described interms of the Neighborhood Radius (as discussed in the sectionConnectivity Patterns).

[0393] For each of the points NP_(j), its level L_(j) is added to theZero^(th) Moment ZM, while its First Moment defined as L_(j) scaled bythe difference between the points NP_(j) and CP_(i) 1008, is summed inFM When all the Neighborhood Points have been processed, the DO loop isended and a newPoint can be calculated 1009. The newPoint is defined asthe center-of-gravity of the Neighborhood Points (FM/ZM) scaled by theWarp Rate (WR) 1010. The newPoint is added to the collection newPoints,and the loop is repeated for the next CP_(i), until all of thecurrentPoints have been processed and the DO loop is ended 1011. ThenewPoints then replace the currentPoints 1012 and the currentPoints arereturned 1013.

[0394] Warp Grid Adaptation Examples

[0395]FIG. 17 illustrates a single step of the Warp Grid AdaptationProcess applied to the Initialized Warp Grid illustrated in FIG. 7. TheWarp Rate has been set to 0.5, meaning that the process of adaptationcauses each point in the Warp Grid to reposition itself halfway towardsthe center-of-gravity of its Connectivity Pattern.

[0396]FIG. 18 illustrates three steps of the Warp Grid Adaptationprocess applied to the Initialized Warp Grid illustrated in FIG. 7, withthe Warp Rate set at 0.5. It can be seen that each iteration of theadaptation process causes most of the points in the grid to migrate asmall distance on the Digital Image. The migration does not continueindefinitely with additional iterations, but reaches an EquilibriumConfiguration after which there is no further significant migration.

[0397]FIG. 19 illustrates the Warp Grid of FIG. 7 following a total of250 iterations of the Adaptation step. At this point the Warp Grid hasreached its Equilibrium Configuration. Most of the grid points willremain stationary with the application of additional adaptation steps. Afew of the grid points, most notably those in the dark regions of thepicture, will randomly move within small orbits around their equilibriumcenter with the application of additional adaptation steps. Eventually,with very, very large numbers of iterations, the EquilibriumConfiguration may drift.

[0398]FIG. 20 illustrates the Warp Grid Vectors for the Equilibrium WarpGrid of FIG. 19. Each Warp Grid Vector is drawn as a line emanating froma small square dot, the dot indicating the position of the Grid Point inthe Initialized Warp Grid, the line indicating the magnitude anddirection of the Grid Point displacement following the application of250 iterations of the adaptation process. As can be seen from FIG. 18,each point in the Initial Warp Grid generally follows a path taking itin the direction of the closest bright regions of the picture. Pointscentered in a bright region do not move significantly. Points in darkregions equidistant from bright regions in opposing directions areconflicted and do not move significantly. Remember that points along theedges of the images are, in fact, almost equally distant from theopposite edge because of the torroidal wrap around.

[0399]FIG. 21 illustrates a Digital Image and its corresponding WarpGrid of 96 rows and 64 columns in an Equilibrium Configuration. Thefigure on the right clearly illustrates that the fine detail of theDigital Image cannot be captured by the fully adapted Warp Grid,although it is clear that employing a finer grid captures far more ofthe image detail than a coarse grid. The Neighborhood Radius in thisexample is 1. This is not to be viewed as a shortcoming of the Warp GridAlgorithm as it is not the purpose of the Warp Grid Algorithm topreserve image pictorial content.

[0400] Comparing Adapted Warp Grids

[0401] The degree of similarity of two matched Pictures is determined inlarge part by the similarity of their Adapted Warp Grids. The degree ofsimilarity of two Adapted Warp Grids is based on the distance they areseparated from one another in the multidimensional space of the WarpGrid Vectors, called the Match Distance.

[0402] In order to directly compare two Adapted Warp Grids, theirsampling grids must be of the same dimensions and, in general, theirConnectivity Patterns should be the same. Furthermore, the number ofWarp Algorithm Iterations for each should be the same. Also, their WarpRate (WR) should be equal or nearly so. Even if all these conditionsaren't exactly true, two adapted Warp Grids may be conditionallycomparable if adaptation has been allowed to continue until anequilibrium configuration is reached. In that case, the particulars ofthe Warp Algorithm parameters are not as critical since the equilibriumconfiguration is primarily dependent on the Composition of the Picturesbeing matched, secondarily on the Warp Grid Connection Pattern, andquite independent of the speed with which the equilibrium is reached.However, for the remainder of this discussion, we will assumeequivalence of all Warp Algorithm parameters for unconditionallycomparable Adapted Warp Grids.

[0403] Assume that the Warp Grid is M-by-N, M columns and N rows. Aspreviously described, the Adapted Warp Grid is represented by an M*Ndimensional vector, the Warp Grid Vector, whose elements areDisplacement Vectors representing the displacements of the Warp Gridpoints from their initial positions by the Warp Algorithm. EachDisplacement Vector is characterized by both u-direction and v-directiondisplacement components.

[0404] Let p_(m,n) denote the Warp Grid point on the m^(th) column andthe n^(th) row of the initial M-by-N Warp Grid. Let q_(m,n) be thecorresponding point in the Warp Grid following some number of iterationsof the Warp Algorithm. Then the Warp Grid Vector is a vector V of M*Nelements v_(m,n), where the elements are the displacement vectors

V_(m,n)=q_(m,n)p_(m,n)

[0405] taken in row-by-row order on the indices of the Warp Grid points.

[0406] Let E and F be two Warp Grid Vectors, each being of dimension M*Nand each being generated by a Warp Algorithm of i iterations with WarpRate WR. Then the magnitude of the difference between E and F is givenby the relationship $\begin{matrix}E & {{F} = \sqrt{\begin{matrix}M & N \\{m = 1} & {n = 1}\end{matrix}{{E_{m,n}\quad F_{m,n}}}^{2}}}\end{matrix}$

[0407] where

∥E _(m,n) F _(m,n)∥²=(Eu _(m,n) F _(m,n))²+(Ev _(m,n) Fv _(m,n)) ²

[0408] where Eu_(m,n) denotes the u component of the m,n^(th)displacement vector of E and Fu_(m,n), EV_(m,n) and Fv_(m,n) are definedrespectively.

[0409] The Match Distance between two Warp Grid Vectors E and F is themagnitude of their vector difference normalized by the number ofelements in each Warp Grid Vector,${{match}\left( {E,F} \right)} = \frac{\begin{matrix}E & {F}\end{matrix}}{M \times N}$

[0410] Thus the closeness of match of two Warp Grid Vectors is theaverage distance between all the corresponding displacement vectors ofWarp Grid Vectors.

[0411] It is also possible to define the Match Distance between two WarpGrid Vectors in alternate ways. For example, the closeness of matchbetween a given Warp Grid Vector E and a Warp Grid Vector F from adatabase can be based on the magnitude of displacement vectordifferences weighted by the values of Warp Grid samples at the gridpoints of E. Letting E_(m,n) and F_(m,n) denote the Displacement Vectorsof Warp Grid Vectors E and F respectively, and letting L(p_(m,n)) denotethe sampled level of the Digital Image at the point p_(m,n),corresponding to Displacement Vector E_(m,n), a weighted distancemeasure for E and F becomes the average weighted difference between thecorresponding displacement vectors of E and F,${{weighted\_ match}\quad \left( {E,F} \right)} = \frac{{weighted\_ difference}\quad \left( {E,F} \right)}{M \times N}$

[0412] where the magnitude of the weighted difference of E and F isequal to $\sqrt{\begin{matrix}M & N \\{m = 1} & {n = 1}\end{matrix}{L\left( P_{m,n} \right)} \times {{E_{m,n}\quad F_{m,n}}}^{2}}$

[0413] The weighted matching criteria is useful in cases where anEquilibrium Configuration of the fully adapted Warp Grid is notparticularly stable, the small seemingly random motions of some of thegrid points with continued adaptive iterations causing the matchdistances involved to fluctuate. Examination of the circumstances ofthese grid point perturbations reveals that they arise in regions in theDigital Image with extremely small sampled values. In that case, thecenter of gravity of a Connectivity Pattern in the region isparticularly sensitive to very small changes in the sampled values atthe points of the Connectivity Pattern. The weighting match criteriadescribed above places less emphasis on these “noisier” Warp Griddisplacement vectors, yielding a more stable match distance.

[0414] Visual Key Matching A Visual Key Vector is a combination of theWarp Grid Vector and possibly some other vector of Image Measures. So,in general, the number of vectors being compared is greater than justthe n*m vectors of the Warp Grid. But not much more, because the WarpGrid is the primary way that Visual Keys Vectors separate themselves inspace.

[0415] From the preceding discussions it can be concluded that a bestmatch to a given Visual Key Vector may be obtained by pairwise comparingthe given Visual Key Vector to all the Visual Key Vectors in thedatabase and noting which one yields the closest match. The question ofwhether a given database contains a match to a given Visual Key Vectoris equivalent to the question of whether the best match in a database issufficiently close to be considered to have arisen from the samePicture. Thus the matching distance of the best match must be comparedto a specified maximum allowable matching distance to be considered tohave arisen from the comparing of Visual Key Vectors derived from thesame Picture.

[0416] Likewise, when attempting to find all the matching Visual KeyVectors in a database that match a given Visual Key Vector, it isnecessary to consider the question of how many matching Visual KeyVectors are sufficiently close to have arisen from the same Picture, aconclusion that can be decided by comparing all the match distancesagainst a suitably chosen threshold match distance.

[0417] Ultimately, we must address the question of the size of thedatabase of Visual Key Vectors and the number of computational stepsrequired to select the best matching Visual Key Vectors. It is theintention of the present invention to minimize both database size andthe number of computational steps in selecting Visual Key Vectormatches.

[0418] Reducing the Database Size

[0419] The size of the database of Visual Key Vectors is the number ofVisual Key Vectors in the database times the number of bytes in a VisualKey Vector. Suppose the Visual Key Vector is composed only of the WarpGrid Vector, and consider the application of the warp grid algorithm toa monochrome picture. If the dimensions of the Warp Grid are 16-by-16,and if the u and v components of a Displacement Vector each requires 8bytes for floating point representation, then the size of a Visual KeyVector is 16*16*2*8 bytes or 4 Kilobytes. If the database consists of 1million Visual Key Vectors, then its size is 4 Gigabytes.

[0420] If we are required to find the best matching Visual Key Vectorfrom a database, each Visual Key Vector in the database will need to becompared to the corresponding vector of the Query Visual Key Vector. Forthe example in the preceding paragraph, that would represent 16*16*2*1million of 8-byte comparisons. If each 8-byte comparison took tennanoseconds (10⁻⁸ seconds) then a best match search of the database of 1million would take 5.12 seconds, disregarding any other necessarycomputations required for determining the match distance.

[0421] The 1 million estimate of database size is modest by present daystandards, and the estimate of the speed of comparison is optimistic.Therefore, it must be concluded that the present invention so fardisclosed would work best for small databases or relatively slowsearches. Clearly the questions must be posed as to how small a VisualKey Vector will suffice to allow positive identification and how mayunnecessary comparison operations be eliminated to speed up databasesearches for matching Visual Key Vectors?

[0422] One way to reduce the size of a Visual Key Vector is to reducethe size of the Warp Grid. From the preceding example, an 8-by-8 gridwould require 1 Kilobyte of storage while a 4-by-4 grid would require256 bytes. The question that needs to be posed is whether Picturematching using a 4-by-4 Warp Grid would work, assuming 1 million VisualKey Vectors in the database.

[0423] To answer the above question we might start by asking anotherquestion: “For what categories of Picture Composition would the proposedinvention fail to yield a satisfactory result?” One surprisingly simpleanswer is that a Warp Grid Algorithm fails to discriminate betweenPictures where the Warp Grid sampled pixel values are all the same. Inthat case the adaptation step yields all zero displacement vectors sincethe center of gravity of each given grid point's Connection Pattern iscoincident with the given grid point (assuming a symmetric ConnectionPattern). Of course, if a Picture's Composition is a uniform value, wemight be inclined to accept a number of “Uniform Pictures” as beingequivalent as far as the Warp Grid Algorithm is concerned. But with WarpGrids as small as 4-by-4, a number of non-Uniform Pictures from amongstthe 1 million Pictures in the database are likely to be confused withUniform Pictures. For example, sampling the same value at all 16sampling points might commonly occur when a Picture's Compositionrepresents the image of an irregularly shaped opaque object displayedagainst a uniform contrasting background. (See FIG. 23 for anillustration of this example). Furthermore, the seemingly UniformPicture is just one example of a class of Pictures that are notsatisfactorily handled when the Warp Grid dimensions are reduced tosmall positive integers. The fewer the number of points, the moreproblematical becomes the initial positioning of the points in thepicture, and the more pathological cases there are.

[0424] The above example is typical of the kinds of unexpected resultsthat can occur when attempting to match Digital Images based on a verysmall number of pixel samples. Indeed, experiments have shown that thequality of matches improves as the Warp Grid dimensions increase. Thissaid, how can we reduce the storage requirements of our database?

[0425] One answer is surprisingly simple and turns out to be verysatisfactory. Use a relatively fine Warp Grid in the Warp Grid Algorithmbut sample the adapted Warp Grid points in creating the Visual KeyVector. It can be immediately appreciated that a 4-by-4 sample of a16-by-1.6 adapted Warp Grid is not the same as a 4-by-4 adapted WarpGrid. For example, a 16-by-16 grid will match to a Uniform Picture onlywhen all 256 sampled pixels are the same value, thus yielding a muchlower likelihood of an erroneous match than if the number of pixelsamples were only 16. But more importantly, a typical Connection Patterndefined on a fine Warp Grid will be bound to only a small region of thePicture Composition, while a Connection Pattern on a very coarse gridwill necessarily span most of the Picture. Thus the points in the finegrid will be much more sensitive to local variations than the points inthe very coarse grid. And when we are attempting to distinguish fromamong a million or more pictures, it becomes necessarily the case thatit is in the fine details that the best of the closest matching picturesis determined.

[0426] Sampling the Warp Grid is surprisingly effective in creatingVisual Key Vectors with significant discrimination power. Part of thereason for this lies in the “holographic” nature of the Warp Vectors inan adapted Warp Grid. It can be appreciated that, at each iteration ofthe Warp Grid Algorithm, the influence of any given grid point ispropagated through the Warp Grid to adjacent points in its ConnectionPattern. When the number of Warp Algorithm Iterations is comparable tothe Warp Grid dimensions, the influence of a single grid point is feltby every grid point. It is through the process of adaptively balancingall of the influences from all of the grid points that an equilibriumconfiguration of grid points is reached. Thus each Displacement Vectorin the Warp Grid carries information regarding the totality of all pixelvalues sampled through the iterative steps of the Warp Algorithm.

[0427] That is why a given selection of the Displacement Vectors of anadapted Warp Grid is so effective at differentiating PictureCompositions. Even though a given Displacement Vector in the selectionis not itself directly sampling a given region of the Picture, the givenDisplacement Vector is nevertheless influenced by those pixel levels inthe given region of the Digital Image which are sampled by otherunselected Grid Points in the Warp Grid.

[0428] In addition to sampling the Warp Grid, the database of Visual KeyVectors can be reduced in size by reducing the number of bytes necessaryto represent a Warp Grid Vector. In the previous assumption, each WarpGrid Vector Component required 8 bytes for storing as a floating pointnumber. If we were to store each component as a 2 byte integer, thatwould save 75 percent of the required storage space. Rather than havingthe very fine grained resolving power of a floating point number, wewould only be able to resolve vector component to one part in 216 (64K).Would this have an adverse affect on the Picture matching performance ofthe Warp Grid Algorithm? No, because the matching distance computed forpairs of Warp Grid Vectors during match searching is generally very muchlarger than the one part in 64K, and quantizing the vector component to64K levels introduces only a very tiny variation on match distances.

[0429] Another way to reduce the size of Visual Key Vectors is to storea subset of the sampled grid points, for example, keeping only the oneswhose displacement vectors have the maximum values. This allows us toonly retain the most valuable information in the Stored Visual KeyVector. Thus we draw a distinction between the Stored Visual Key Vectorand the Full Visual Key Vector. A database query uses a Full Visual KeyVector, which contains the full set of vectors, whereas the StoredVisual Key Vector only contains the most useful vectors. Since theStored Visual Key Vector also retains information as to the originallocation of each vector, a Full Visual Key Vector can be comparedmeaningfully with each Stored Visual Key Vector.

[0430] Reducing the Number of Search Steps

[0431] How can the number of computational steps required to search thedatabase for matches be reduced? One way is to eliminate unnecessarycomputation. For example, if we are searching for a best match and thesmallest match distance found so far is small, then if we are pairwisematching the vectors at a given record in the database, we can stop assoon as “small” is exceeded and move on to the next record. Similarly,if we preorder all of the records in a database according to a chosenvector, then there are a number of logically constructed processes thatwill eliminate computational steps by eliminating whole ranges ofrecords from the computational process.

[0432] Another way of eliminating unnecessary computation is byselecting a Minimal Acceptable Match Score, and continuing the searchonly when the Minimal Acceptable Match Score exceeds the last Visual KeyVector compared.

[0433] We refer to the techniques suggested above as “pruning” thecomputational space, in that we start by assuming that every Visual KeyVector in the database will need to be examined in the match search.Then we logically create procedures that will eliminate some of themunder certain conditions that we test for as we are individuallycomparing each Visual Key Vector.

[0434] The match search algorithm employed by the present inventiontakes a very different approach to eliminating unnecessary computationalsteps. Rather than assuming the worst (examine every Visual Key Vectorin the database) and working to make the situation better by pruningaway unnecessary computation, we begin by assuming the best (the firstVisual Key Vector we look at from the database matches to within thespecified tolerance) and we do additional work if it is not. We nextexamine that Visual Key Vector which has the next highest probability ofbeing a match. Additional work is done only when the preceding stepfails. At each step, the next most likely matching Visual Key Vector isexamined. The n^(th) step is required only when the preceding n−1 stepshave failed to yield a match, where each of the previous n−1 stepsoptimized the probability of a match.

[0435] The Index Key

[0436] To implement a more efficient search for a matching Visual KeyVector, all of the Visual Key Vectors in the database are given an indexnumber. Visual Key Vectors in the Visual Key Collection are sortedaccording to this index number. Visual Key Vectors are then selectedfrom the Visual Key Collection by performing a binary search on theirsorted indices for a desired index. The index number for a Visual KeyVector is computed from the Visual Key Vector itself by a method thatwill be described later in this section. There may be more than oneVisual Key Vector in the database with the same index. Index numbers inthe database are not sequential; there are frequently large gaps in theindices between adjacent Visual Key Vectors. The index of a Visual KeyVector is referred to as an Index Key.

[0437] An Index Key is derived from a Visual Key Vector by samplingpre-selected measurements from the Visual Key Vector (referred to asVisual Key Vector Elements) and quantizing these pre-selectedmeasurements to a small number of intervals. These Visual Key VectorElements are referred to as Tumblers in the context of producing IndexKeys. Each one of these Tumblers is given a discrete value based on thevalue of the corresponding Visual Key Vector Element, quantized to thedesired number of intervals (referred to as Bins).

[0438] Various criteria may be used for selecting the Tumblers from theVisual Key Vector to be used in producing the Index Key. For example,one could select Tumblers based on their color value, hue, intensity,geographical location, etc. Which criterion or combination of criteriachosen would depend on the specific application, especially the generalnature of the Pictures in the database. In fact, a Visual Key Databasecould be indexed in multiple ways, adding to the flexibility andeffectiveness of the system. For example, looking through the databasefor a black and white image might be more effectively done viaintensity-based Index Keys, rather than R,G,B-based Index Keys.

[0439] So an Index Key, obtained by sampling and quantizing a Visual KeyVector, consists of G ordered Tumblers representing the orderedquantized sample. Each Tumbler has a discrete value corresponding to thequantization level (Bin) of the quantized vector. For example, aninteger j in the range 1 to B may represent the B possible Bins of avector element. Within this document, our examples show 10 Tumblersdivided into 5 Bins; however, both the number of Tumblers and the numberof Bins can be varied for performance optimization.

[0440] To quantize a Tumbler to 5 levels, the a priori ProbabilityDensity Function (PDF) of the frequency of occurrence of a given Tumblerlevel is subdivided into 5 regions, or bins, by 4 slicing levels, asshown in FIG. 22. The a priori PDF of a Visual Key Vector measurement isderived from the statistical analysis of a very large number of VisualKey Vector Elements taken from a very large number of differentPictures. The slicing levels are selected to give equal chances to eachBin of containing a randomly selected Tumbler. If the 5 Bins arerepresented by 5 symbols (for example, the numerals 0 through 4), theneach symbol will be equally likely to represent a given Tumbler. Byequalizing the frequency of occurrence of each symbol, we maximize theamount of information each symbol contains.

[0441] The a priori PDF of a Visual Key Vector Element (which is aDisplacement Vector) is ideally a Gaussian distribution with zero mean.A zero mean is assumed because there are no preferred directions in anarbitrary Picture, and there are no edge effects in the Warp Grid sinceit is toroidally wrapped. But actual PDF's of Warp Grid Vectors of realpopulations of pictures may vary from the ideal and become elliptical (uand v correlated), disjoint, or displaced (non-zero mean). In thesecases the performance of the index keys to address the database ofVisual Key Vectors will be compromised unless adequate care is taken tonormalize variations from the ideal case described above.

[0442] Index Keys and Tumbler Probabilities

[0443] When a Query Picture is presented to the system for matching to adatabase object, an Index Key is prepared for that picture. Because aQuery Picture is generally somewhat different from the Best MatchingPicture in the Visual Key Database, a given Tumbler's Bin in the IndexKey of the Query Picture (referred to as the Query Index Key) may or maynot match the corresponding Tumbler's Bin in the Index Key of a MatchingPicture. That is, a given Tumbler's Bin in a Query Index Key is eithercorrect or it wrong. A Tumbler's Bin is correct if the quantizationlevel of its corresponding Visual Key Vector Element is the same as thequantization level of the corresponding Visual Key Vector Element forthe Matching Picture, otherwise it is wrong.

[0444] A Tumbler Probability function associates a Tumbler Probabilitybetween 0 and 1 to a Tumbler Bin, and represents the probability thatthe Bin of the Tumbler is correct. Referring to FIG. 24, we see thebasic process of generating Tumbler Probabilities 2400.

[0445] The same set of Tumblers is sampled from the Visual Key Vector aswas used to create the Index Keys in the Visual Key Database originally2401. In other words, a pre-selected set of Visual Key Vector Elementsis used to produce Index Keys and Query Index Keys alike. A DO loop isestablished to go through each of these selected Tumblers in order togenerate their corresponding Tumbler Probabilities 2402.

[0446] For each of the Tumblers, we construct a set of TumblerProbabilities (one for each Bin) whose value represents the probabilitythat the Tumbler falls into that particular Bin 2403. These TumblerProbabilities are then sorted in order of decreasing probability 2404.When each of the Tumblers has been processed, the DO loop is ended 2405and the stream of Tumbler Probabilities is returned 2406.

[0447] For each of the G Tumblers (denoted T₁ to T_(G)) comprising aQuery Index Key, we construct a set of B Tumbler. Probabilities. TheTumbler Probability TP_(g,b) (where g=1 to G, b=1 to B) is computed tobe the conditional probability Prob_(g){b/i} that the g^(th) Tumbler'scorrect bin is b given that the Tumbler's actual bin is i, where i=1 toB. The Tumbler Probability TP_(g,b) is calculated from Bayes' rule as:${TP}_{g,b} = {{Prob}_{g\quad b{i}} = \frac{{Prob}_{g}\left( {b,i} \right)}{{Prob}_{g}(i)}}$

[0448] where Prob{b,i} is the joint probability that the correct TumblerBin is b and the actual Tumbler Bin is i, and Prob_(g){i} is the apriori probability that the gth tumbler is in bin i. Note: We are alsointerested in computing the conditional probability TP_(g,b) using thecontinuous conditional probability cProb_(g){b|w}, the probability thatthe gth Tumbler's correct bin is b given that the corresponding VisualKey Vector Element is actually w, where w is essentially unquantized andcontinuously varying,$\left. {{TP}_{g,b} = {{cProb}_{g}\left\{ b \right.w}} \right\} = {{{cProb}_{g}{\left\{ {b,w} \right\}/{cProb}_{g}}\left\{ w \right\} {TP}_{g,b}} = {{cProb}_{g\quad b{w}} = \frac{{cProb}_{g}\left( {b,w} \right)}{{cProb}_{g}(w)}}}$

[0449] Here, cProb_(g){b|w} is the joint probability density functionthat the correct Tumbler Bin is b and the actual value of thecorresponding Visual Key Vector Element is w, and cProb_(g){w} is the apriori probability density function that the gth tumbler's correspondingVisual Key Vector Element is w. The choice of which conditionalprobability to compute is left to the requirements of the specificapplication. In general, Prob_(g){b|i} is easier to compute thancProb_(g){b|w} but is not as accurate.

[0450] Although, in the present discussion, we have chosen to illustratethe case where all Tumblers have the same number of Bins, there isnothing in the following discussion which would preclude the applicationof the methodology to those cases where different Tumblers in an IndexKey have different numbers of Bins. The following description of themethodology is fully consistent with this alternative condition.

[0451] A Query Index Key is correct if all its G Tumblers identicallymatch the G Tumblers of its Best Matching Picture in the Visual KeyDatabase. The probability of any given Index Key being correct may becomputed as the joint probability of all of its Tumbler Probabilities.It is not unreasonable to assume that the individual TumblerProbabilities are statistically independent when the sampling for theIndex Key is selected so that the individual selected Tumblers are wellseparated spatially and/or functionally. Furthermore, it must be assumedthat the individual pictures that give rise to the Visual Key Databaseare independent and uncorrelated. Assuming independence, the probabilityof any given Index Key being correct may be computed as the product ofall of its Tumbler Probabilities. This assumption is not unreasonablefor many picture collections, but is not well suited to streaming media,where individual frames are highly correlated for the reason that theymust convey the illusion of continuous motion. The subject of Index Keysfor streaming media will be covered later on in this disclosure.

[0452] Preferred Search Algorithm

[0453] As the number of different Pictures represented in a Visual KeyDatabase increases, the number of Tumblers in an Index Key must increaseto permit different Pictures different Index Keys. As Index Key sizeincreases, the probability that it is wrong on any given comparisonincreases, meaning that one or more of the Tumblers in the Index Key isin the wrong Bin. Therefore, we will wish to search the nearby space ofpossible Index Keys starting with those that have the highestprobability of being correct. The simple-minded approach would be toconstruct all possible Index Keys and sort them, by their probabilities.One could then iterate on the sorted list considering each Index Key inturn, starting with the most probable. The space of all Index Keys canbe quite large, as it is equal to B raised to the G^(th) power if eachtumbler has B states. For most practical cases, the simple-mindedapproach is virtually infeasible.

[0454] In order to efficiently search in what may potentially be a hugeIndex Key space, the present invention takes a novel approach, which canbe summarized in five steps (outlined below).

[0455] 1. Compute an Index Key from the Visual Key Vector of the QueryPicture's Digital Image. This computed Index Key is the most likelyindex of the Visual Key Vector in the Visual Key Database that mostclosely matches the Visual Key Vector of the Query Digital Image.

[0456] 2. Locate a Visual Key Vector in the database Visual KeyCollection with Index Key equal to the Index Key computed in step 1. Ifthere is no identical Index Key in the database, then go to step 5.

[0457] 3. Compare the Visual Key Vector selected at the Index Key to theVisual Key Vector of the Query Picture's Digital Image.

[0458] 4. If the comparison of step 3 results in too low a Match Score,then repeat step 2 to see if there is another Visual Key Vector in thedatabase with an index that is identical to the Index Key computed instep 1.

[0459] 5. If the present Index Key does not appear among the indices ofthe Visual Key Vectors in the database, construct sa new Index Key whichis the next best guess at the index of a matching Visual Key Vector, andgo back to step 2.

[0460] Squorging

[0461] We do not wish to enumerate the entire space of possible IndexKeys, as we will only be interested in a very small percentage of themwhich occupy the space near the given Index Key. Instead we produce asequence of Index Keys one at a time starting with the one with thehighest probability, and, sequentially generate the next most probableIndex Key at each iteration. By using a “pull” methodology, we onlyperform as much computation as is necessary to produce the next mostprobable Index Key. We have given the name “Squorging” to this uniquepull methodology, “Squorge” being loosely derived from the words“Sequential Generation”.

[0462] Squorging makes use of a recursive decomposition of the problemof sequentially generating the next most probable Index Key. An IndexKey of size G may be, constructed by putting together two “half”Sub-Index Keys. By taking all cross combinations of the Sub-Index Keycomprised of TP₁ to TP_(i) with the Sub-Index Key of TP_(i+1) to TP_(g),where i=G//2, (integer division) one can construct all Index Keyscomprised of TP₁ to TP_(G). We apply this recursively, “halving” theSub-Index Keys until we are combining individual Tumbler Probabilities.

[0463] If we start with two lists of either Sub-Index Keys or TumblerProbabilities, where each list is sorted by decreasing probability, wewill observe that those combinations with the higher joint probabilitieswill come from combining those items near the beginning of the inputlists. This observation is what we will use in the Squorge methodologyto be described herein.

[0464] Recursion Flowchart

[0465] A new Squorger is created by connecting its inputA and inputB totwo streams. Each Squorger input stream may either be another Squorgeror a Tumbler Probabilities Stream.

[0466]FIG. 25 is a flowchart showing the process of recursively handlingstreams of Tumbler Probabilities (TP). For each of its inputs, aSquorger is requested for a stream of Tumbler Probabilities 1 through Gwide by B deep 2500. The variable G corresponds to the number ofTumblers in the Index Key; the variable B corresponds to the number ofTumbler Probabilities for each Tumbler. If G is 1 2501, a Squorger isnot needed, and so a Stream is created on a Tumbler Probability B deep2502; then that Stream is returned 2503.

[0467] If G is not 1, the collection of Tumbler Probabilities is splitin half 2504. The first half will be 1 through i, where i is one half ofG; the second half will be i+1 through G. For each half, anotherSquorger is requested for each of its input streams; the variablesaStream 2505 and bStream 2506 are set for the firstHalf and secondHalf,respectively. Each of these will be either a Squorger or a Stream,depending on where in the overall Squorger tree we are at the moment.

[0468] At this point, the variable squorger is initialized to a newSquorger, using aStream and bStream as its inputs 2507. The new squorgeris then initialized with all of the variables necessary for it to do itsjob 2508, and the squorger itself is returned 2509.

[0469] To summarize the nature of this recursive method: at each level,it tests for the following terminating condition of the recursion: if ahalf collection is just a single stream of Tumbler Probabilities, itsets the input to the corresponding Tumbler Probabilities Stream.Otherwise, the input is set to another Squorger. So, at the lowestlevels of a Squorging tree, the inputs will all be Streams of TumblerProbabilities and the outputs will be Sub-Index Keys; at the highestlevel, the inputs will all be Squorgers and the output will be a streamof full Index Keys.

[0470] Recursion flowchart

[0471] A new Squorger is created by connecting its inputA and inputB totwo streams. Each Squorger input stream may either be another Squorgeror a Tumbler Probabilities Stream.

[0472]FIG. 25 is a flowchart showing the process of recursively handlingstreams of Tumbler Probabilities (TP). For each of its inputs, aSquorger is requested for a stream of Tumbler Probabilities 1 through Gwide by B deep 2500. The variable G corresponds to the number ofTumblers in the Index Key; the variable B corresponds to the number ofTumbler Probabilities for each Tumbler. If G is 1 2501, a Squorger isnot needed, and so a Stream is created on a Tumbler Probability B deep2502; then that Stream is returned 2503.

[0473] If G is not 1, the collection of Tumbler Probabilities is splitin half 2504. The first half will be 1 through i, where i is one half ofG; the second half will be i+1 through G. For each half, anotherSquorger is requested for each of its input streams; the variablesaStream 2505 and bStream 2506 are set for the firsthalf and secondHalf,respectively. Each of these will be either a Squorger or a Stream,depending on where in the overall Squorger tree we are at the moment.

[0474] At this point, the variable squorger is initialized to a newSquorger, using aStream and bStream as its inputs 2507. The new squorgeris then initialized with all of the variables necessary for it to do itsjob 2508, and the squorger itself is returned 2509.

[0475] To summarize the nature of this recursive method: at each level,it tests for the following terminating condition of the recursion: if ahalf collection is just a single stream of Tumbler Probabilities, itsets the input to the corresponding Tumbler Probabilities Stream.Otherwise, the input is set to another Squorger. So, at the lowestlevels of a Squorging tree, the inputs will all be Streams of TumblerProbabilities and the outputs will be Sub-Index Keys; at the highestlevel, the inputs will all be Squorgers and the output will be a streamof full Index Keys.

[0476] Basic Squorger Operation

[0477] The basic operation of a Squorger is illustrated in FIG. 26a. TheSquorger takes two input streams of Tumbler Probabilities 2601, 2602from an Index Key and produces a stream of Index Keys 2603 which arevariations of the original Index Key, starting with the most likely one.The original Index Key in the illustration is ten elements long. TheSquorger 2604 takes each half (5 elements long) into each of its inputstreams and combines them into new Index Keys of 10 elements each, thesame length as the original 2605.

[0478] Recursive Squorger Decomposition

[0479] The preceding description gives a top-level view of how aSquorger functions for an Index Key of 10 Tumblers. The diagram in FIG.26b illustrates a Recursive Squorger Tree 2650. This is identical to theSquorger shown in FIG. 26a, except that here the breakdown of internalcombinations is shown, revealing the recursive, nested nature ofSquorger operation.

[0480] A tree of Squorgers is created using as inputs the collection ofG streams of Tumbler Probabilities 2651 corresponding to the G Tumblersin the Index Key 2652; each Tumbler Probability stream is ordered bydecreasing probabilities.

[0481] The tree of Squorgers is created by a recursively-appliedmethodology, at each point dividing the, collection of input streamsinto a firstHalfCollection and secondHalfCollection. Where thecollection is evenly divisible, firstHalfCollection andsecondHalfCollection will also be even 2653; where the collection is notevenly divisible, secondHalfCollection will be one greater thanfirstHalfCollection 2654.

[0482] So the Squorgers farthest down in the tree have streams ofTumbler Probabilities as inputs, or a stream of Tumbler Probabilitiesfor one input and a Squorger for another, where the collection cannot beevenly divided 2654. Those farther up in the tree typically haveSquorgers for both inputs 2655.

[0483] At each level, the Squorger puts out a Sub-Index Key whose sizeis that of the sizes of its inputs combined 2656. At the final output,what emerges from the Squorger is an Index Key, the same size as theoriginal Index Key 2657.

[0484] Squorger Algorithm

[0485] The Squorger algorithm makes use of the following variablessummarized in Table 1 below. TABLE 1 Squorger Variables Variable: Type:Description: InputA Squorger or Stream of The source for the first halfof Tumbler Probabilities each Index Key being constructed. The valuesare expected in order of decreasing probability. InputB Squorger orStream of The source for the second half Tumbler Probabilities of eachIndex Key being constructed. The values are expected in order ofdecreasing probability. ListA Sorted Collection of The list of Sub-IndexKeys that sub-Index Keys have already been retrieved ordered bydecreasing from inputA. probability ListB Sorted Collection of The listof Sub-Index Keys that sub-Index Keys have already been retrievedordered by decreasing from inputB. probability Connection OrderedCollection of A parallel collection to that of Counts Integers listA.Each value gives how many elements from listB have already been combinedwith the corresponding element of listA. FirstNonFully Integer Index ofthe first slot in listA ConnectedSlot whose element has not yet beencombined with every element in listB. FirstUn- Integer Index of thefirst slot in listA ConnectedSlot whose element has yet to be connectedto any element of listB. SizeA Integer Cached value of the total numberof elements that could be provided by inputA. SizeB Integer Cached valueof the total number of elements that could be provided by inputB.

[0486] Squorger Initialization

[0487] Each newly created Squorger needs to be initialized byinitializing all nine Squorger variables shown in Table 1. The inputAand inputB variables are set to the incoming streams, either anotherSquorger or a stream of sorted Tumbler Probabilities. The sizeA andsizeB variables are the maximum number of elements that could possiblybe retrieved from each input stream. Both of the internal lists, listAand listB are created as Ordered Collections and pre-charged with thefirst element from each input stream. Each list is dynamic; the memoryallocation for each list is continuously adjusted to the number ofelements in the list. The connectionCounts variable is also initializedto an Ordered Collection and given a single element whose value is 0.The zero value is in parallel with the first element in listA, andrepresents that this element has been cross connected with none of theelements in listB.

[0488] Continuing with the initialization of a Squorger: the first slotin listA is not fully connected since it doesn't as yet have anyconnections, hence the variable, firstNonFullyConnectedSlot is set to 1.And similarly, the first slot in listA is the first slot that has noconnections, hence the variable firstUnConnectedSlot is set to 1. Withthat, the Squorger is initialized and ready for use.

[0489] Squorger Control

[0490] A Squorger is commanded via just two messages. In response to thesize message, a Squorger answers how many Index Keys could possibly beretrieved from the Squorger. In response to the next message, a Squorgeranswers a single Index Key.

[0491] Squorger Size

[0492] Since the possible Index Keys are derived from crossing allpossible pairs from the two input streams, the size method isimplemented quite simply, as answering the product of the two inputstream sizes:

size=sizeA*sizeB

[0493] Squorger Next

[0494] Concatenating a Sub-Index Key from listA with a Sub-Index Keyfrom listB produces the next Index Key that the Squorger returns. TheSquorger must, however, decide which two elements to concatenate. Thisrequires looking at various listA-listB pairs and selecting the one thathas the highest probability. As the next most likely possibility isrequired, it is requested of the Squorger by sending it the next method.The elements in each list are ordered by decreasing probabilities.Furthermore, a Squorger always returns the combined elements in order ofdecreasing probability. Because of this, and because the probability ofa concatenated Index Key is equal to the product of the probabilities ofits two elements, for any given element in listA, we will always connectit with elements 1 through n in listB before connecting with element n+1in listB. We keep track of this in the variable connectionCounts. Foreach slot in listA, connectionCounts holds the number of elements fromlistB that have already been combined with it. So in general, the valuein connectionCounts at a given index into listA gives the index of thelast element in listB that has been connected to that element fromlistA.

[0495] Once an element from listA has been combined with every elementfrom listB, we no longer need concern ourselves with that element. Onlyelements from listA at or beyond the firstNonFullyConnectedSlot are ofinterest. Also, the firstUnConnectedSlot in listA has yet to beconnected to the first slot in listB. When it is, its probability willnecessarily be higher than connecting any subsequent slot in listA. Sowe have no interest as yet in any elements beyond thefirstUnConnectedSlot in listA. The last slot we are interested in rightnow is the firstUnConnectedSlot if it exists. If we've managed to reachthe end of inputA, then the firstUnConnectedSlot will have advancedbeyond sizeA. In that case we only want to go as far as thefirstUnConnectedSlot or sizeA, whichever is smaller.

[0496] Detail of Squorger Next Method

[0497] With this understanding, we can now see how the Squorger nextmethod is implemented 2700. This is illustrated in the flowchart in FIG.27.

[0498] We start by setting up a DO loop over the interval of interest inlistA from firstNonFullyConnectedSlot to the minimum offirstUnConnectedSlot or sizeA. Each time around the loop, we select thiselement in listA and store its index in a temporary variable indexA2701.

[0499] The expression listA at indexA gives the element from listA we'llbe using 2701. It will either be a Sub-Index Key or a single TumblerProbability. Since connectionCounts at indexA gives the index in listBof the last connected element, listB at ((connectionCounts at indexA)+1)gives the next element from listB we'll be using (again, either aSub-Index Key or a Tumbler Probability). This is what we'll assign tothe variable indexB 2702.

[0500] As described in the preceding paragraph, given an element fromlistA, we already know which element to use from listB (via itsconnectionCounts). So we merely need to iterate over a subsection oflistA and find the combination with an element of listB with the highestprobability.

[0501] The variable listA is an Ordered Collection, which is dynamicallysized. Elements are only fetched from inputA when they are actuallyneeded. Since inputA may be a whole Squorging sub-tree, unnecessarycomputations are avoided. However, as we are looking though listA andlistB for the next best combination, we must delve deeply enough intothe input streams to be assured that the next element won't possiblyyield a better combination. When the condition indexA>listA size istrue, we are asking for an element at an index beyond the current sizeof the list 2703. In other words, we need an element that we have notyet pulled from the input stream. As long as that condition holds, weadd elements to listA by sending the next message to inputA 2704. Thatwill fetch the next element, (either an Index Key or a TumblerProbability) from inputA (either another Squorger or a TumblerProbability Stream) and add it to the end of listA.

[0502] In a similar fashion, we handle the fetching of needed elementsfrom inputB for listB 2705, 2706.

[0503] We then compute a temporary variable value for the probability ofthe Index Key formed by connecting the element at indexA in listA to itsnext unutilized element from listB (indexB). The value of connecting agiven element from listA with a given element from listB is the jointprobability of the two elements, which (given the independence of VisualKey Vectors assumption) is the product of the individual elementprobabilities 2707.

[0504] We use a temporary variable bestValue to keep track of thehighest probability that we've found so far, and use bestIndex toremember the index at which the bestValue occurred. At this point, wecheck to see if the temporary variable bestValue is empty or if value isgreater than bestValue 2708. If it is, we'll change the value ofbestValue to be what is currently contained in value, and set thetemporary variable bestIndex to be what is currently contained in indexA2709.

[0505] At this point, whether bestIndex and bestValue have been updatedor not, the DO loop is repeated for the next indexA 2710. When theiteration is fully evaluated, it will retain the highest probability ofa listA-listB conjunction in the variable bestValue and the index tolistA at which the highest probability occurred in the variablebestIndex.

[0506] Once the, DO loop has been executed for the full range of slotsto be considered and the best combination chosen, the Squorger updatesits internal bookkeeping. The connectionCounts for bestIndex isincremented, then indexB is set to that value 2711.

[0507] It then checks to see if indexB is equal to 1 2712. If true, thatmeans this is the first time that we are connecting to this slot inlistA. Since this slot was previously unconnected, we need to updatewhere the firstUnConnectedSlot is. In general the firstUnConnectedSlotwill be just beyond where we are connecting, or bestIndex +1. Sincewe'll now be considering an additional slot in listA, we'll need anadditional corresponding element in the dynamically sized OrderedCollection that holds the connectionCounts for each element in listA.This additional element is initially set to zero. 2713

[0508] Then we check to see if we've now connected the slot in listA toall of the slots in listB by seeing if the connection count has justgrown to be equal to sizeB 2714. If so, we need to update thefirstNonFullyConnectedSlot. Since the slot at bestIndex is now fullyconnected, the firstNonFullyConnectedSlot will be just beyond us, or atbestIndex +1 2715.

[0509] We then concatenate listA at bestIndex with listB at indexB 2716.This will produce a new Index Key whose collection of TumblerProbabilities is the concatenation of the Tumbler Probabilities of thetwo parts, and whose probability is the product of the probabilities ofthe two parts. We return this new Index Key in response to the nextmessage 2717.

[0510] Example of How a Squorger Combines Two Lists

[0511] In FIG. 28A, we see an example of a Squorger combining two listsof Tumbler Probabilities, inputA 2800 and inputB 2801, which act as thesources of Tumbler Probabilities for listA 2802 and listB 2803,respectively. connectionCounts 2804 is a dynamic list parallel to listAthat keeps track of the number of connections that each of the TumblerProbabilities in listA has made so far. The variablefirstNonFullyConnectedSlot 2805 keeps track of the first element inlistA that has not yet been connected to all elements in listB;firstUnConnectedSlot 2806 keeps track of the first element in listA thathas not yet been connected to any elements in listB. The variablesfirstNonFullyConnectedSlot 2805 and firstUnConnectedSlot 2806 providethe boundaries of the range of elements that must be checked in listAfor any given point in the process of selecting the next bestcombination.

[0512] At this point in the process, three connections have been made.The first element in listA is the firstNonFullyConnectedSlot 2805, andhas two connections; the second element in listA has one connection; thethird element has no connections yet, and so is the firstUnConnectedSlot2806. The Combination Results 2807 shows the product of each of thecombinations.

[0513]FIG. 28B shows the actual process of deciding which of thepossible combinations is the best for making a single connection, inthis case, the fourth. The elements to be considered in listA are thefirst (firstNonFullyConnectedSlot 2808) through the third(firstUnConnectedSlot 2809). Each of these elements connects with oneelement from listB on a trial basis (t1 2810, t2 2811 and t3 2812).

[0514] For each element in listA to be tried, one element in listB isused, determined by the connectionCounts for that element, using theformula listB at: (indexA connectionCounts+1), where indexA is theelement in listA to be tried. So the first element in listA combineswith the third element in listB (2+1) 2810, the second element in listAcombines with the second element in listB (1+1) 2811, and the thirdelement in listA combines with the first element in listB (0+1) 2812.The scores for these trial connections are compared 2813 and the bestone chosen, in this case, t1 2810.

[0515]FIG. 28C shows the Squorger after the ninth connection has beenmade. The connectionCounts for the first element in listA is now 5 2814,which is the same value as the size of listB 2815. In other words, thefirst element is fully connected. Therefore, firstNonFullyConnectedSlotnow becomes the second element in listA 2816. The fifth element in listAis the only one that has not been connected to any element of listB,making it the firstUnConnectedSlot 2817.

[0516] Holotropic Stream Recognition

[0517] Experiments have demonstrated the effectiveness of the methodsdescribed thus far in recognizing individual picture objects. In a firstexperiment, one thousand baseball cards were all consistently identifiedfrom their video camera images even though the cards were rotated,translated, zoomed, bent, defaced, shadowed, defocused or partiallyobscured. In another test one million randomly composed geometriccompositions were learned and then properly identified even though ontesting the query compositions had pieces of their original compositionthat were randomly missing, displaced, scaled and colorized.

[0518] The methods previously described do not, however, suboptimal whenthe individual picture objects are individual frames of a movie or videostream. The problem occurs in the process of assigning index keys tovisual keys. A crucial assumption in the Squorge methodology is that theindividual tumbler probabilities are independent. Individual tumblersare identified with individual visual key vectors. For the tumblers tobe independent, the individual visual key vectors would need to beuncorrelated. But this is impossible, because the individual frames of amovie or video stream are highly correlated images; otherwise theobserver would not sense motion. This high degree of frame-to-framecorrelation means that any warp grid vector pattern in a given frame isvery likely to be very nearly repeated many times in the stream, whichsignificantly correlates the individual warp grid vectors. The result ofcreating index keys for individual stream frames by the methodsdescribed thus far does not lead to a desirable uniformly distributeddensity of index keys across the space of possible index keys, butrather an undesirably sparse distribution with some of the index keysbeing duplicated very many times.

[0519] Therefore, in order to extend the present invention to therecognition of streams, it is necessary to add a few additionalcomponents to our suite of algorithms. We call this algorithm suiteHolotropic Stream Recognition (HSR), Holotropic being conjoined fromholo meaning “whole” and tropic meaning “turning towards”.

[0520] Holotropic Stream Recognition is diagrammed in FIGS. 30 and 31.FIG. 30: Holotropic Stream Database Construction, consists of threephases, FIG. 30A: Collecting the Statistics Data, FIG. 30B: Constructingthe Decision Tree, and FIG. 30C: Constructing the Reference Bins.Referring to FIG. 30A, a Media Stream is learned by playing it on anappropriate player 3005 and converting it, frame by frame, into a streamof Visual Keys 3010 using the Warp Grid algorithm. A Media Stream may beobtained directly from a video camera, a television or cable broadcast,a film, a DVD, a VHS tape, or any other source of streaming pictures.But rather than indexing the Visual Keys directly by sampling individualVisual Key Components as previously demonstrated, the Index Keys arederived from statistical measurements of the warp grids which aredefined over the entire warp grid and which characterize the twists andturns of the adapted warp grids themselves. This Visual Key StatisticsStream 3015 for the Media Stream to be learned is recorded by appendingit to the end of the Reference Stream Statistics File 3020, andcataloging it into the Reference Stream Listing File 3025.

[0521] The Reference Stream Statistics File is not used directly toimplement Stream Recognition. FIG. 30B: Constructing the Decision Tree,illustrates that a Decision Tree 3035 for converting Visual KeyStatistics into Index Keys is explicitly constructed from the ReferenceStream Statistics File 3030. The resulting Decision Tree is stored onfile storage device 3040. The Decision Tree maps individual media framesinto Index Keys by sequentially examining each of their statisticalmeasures and sorting them based on a threshold level which isconditioned on the prior results of previous sorts.

[0522] Referring to FIG. 30C: each line of the Reference Stream DataFile 3045 has a sequential frame number, starting with 1. Each frame ofthe Reference Stream is assigned an Index Key by the Decision Tree 3055which has been stored on storage device 3050. In general, there are manymore Reference Stream frames than there are individual Index Keys. TheReference Stream Frame Numbers are sorted into bins according to theirassigned Index Keys 3060, the number of bins being equal to the numberof possible Index Keys. This Reference Bin Data File is stored on device3065. The Reference Bin Data can also be plotted as a Holotropic StorageIncidence Diagram 3070 for visualizing the data.

[0523] Once the Decision Tree and Reference Bins have been constructed,the HSR system can be queried. Referring to FIG. 31: Holotropic StreamQuery Recognition, a Query Media Stream, either one of the streamspreviously learned, a facsimile of a learned stream, a portion of alearned stream or facsimile thereof, or an unrelated stream, is playedon a suitable player. 3105. A Visual Key Stream 3115 is created from theplaying stream by the Warp Grid Algorithm and further reduced to a'stream of Visual Key Statistics 3125. Employing, the Decision Tree 3140previously constructed, the stream of Visual Key Statistics is convertedinto a stream of Index Keys 3135. The Media Stream, Visual Key Streamand Statistics Stream can all be displayed for visual inspection ondisplay devices 3110, 3120 and 3130 respectively.

[0524] Continuing with FIG. 31, the computed Index Keys are used asindices into the database of frame numbers which resides in theReference Bin Data File 3150. The collection of frame numbers residingin an indexed bin are composited with previously indexed bins to form aQuery Tropic 3045, which is a graphical line segment indicating thetrajectory and duration of the Query Stream 3145. This trajectory can beplotted as a Query Stream Tropic Incidence Diagram 3155. The QueryTropic is recognized by analyzing a histogram of its frame numbers 3160.The histogram may be plotted as a Query Stream Recognition Diagram 3165.

[0525] Holotropic Stream Recognition Application Program

[0526] A demonstration computer application illustrating HolotropicStream Recognition is illustrated in FIGS. 32 through 35. FIG. 32, theVisual Key Player, illustrates the user interface window 3200 whichcontains displays of the Media Stream 3210, the Visual Key Stream 3225and the Visual Key Statistics Stream 3230. To operate the Visual KeyPlayer, a video source is loaded by clicking the Load Button 3205 andselecting the video file to be played from a pop-up dialog box (notillustrated). The video file is preferably an mov, avi, rel, asf or anyother digital video format supported by the Video Player 3210 in theapplication interface window. In this demonstration application, thevideo source file may be an advertisement or a movie trailer, asselected by the Source Option 3215, but in general, any video materialmay be used.

[0527] The Visual Key Player operates in three modes as selected by theMode Option 3220. Query Mode is used to identify a source video, LearnMode allows adding a new video to the database of learned videos, andDemo Mode enables the display of the Warp Grid Stream while the video isplaying but does not cause learning or querying to occur. Demo mode alsoenables the display of the Warp Grid Statistics.

[0528] To add a new video to the database of learned videos, the LearnMode is selected and the new video is loaded into the Media Player. Itstitle appears in the Title Text Box 3235. Normally, the Media PlayerControl 3240 will be set at the start of the video so that the entirevideo may be learned. Clicking the Visual Key Button 3245 causes theMedia Player to enter its play mode and the application to initiate theAutoRun Subroutine, flowcharted in FIG. 36. The AutoRun Subroutinecontinues to loop while the Visual Key Button remains depressed and theMedia Player has not reached the end of the video. The functionsperformed in the Learn Mode have previously been diagrammed in FIG. 30.

[0529] Operation in the Query Mode is similar to operation in the LearnMode, with the exception that the loaded video is generally untitledgiven that it is the intention of the Query Mode to identify the videothat is played. The functions performed in the Query Mode havepreviously been diagrammed in FIG. 31. Additional functionality in theVisual Key Player Window is obtained by the Unload Button 3250 forunloading the currently loaded video, the Matching Button 3255 fordisplaying the Matching Window illustrated in FIGS. 33 through 35, theView Button 3260 for displaying and modifying detailed parameters of theWarp Grid Algorithm, and the Exit Button 3065 for exiting theapplication program.

[0530]FIG. 33: Query Stream Recognition Plot, illustrates one possibleoutput of Holotropic Stream Query Recognition. The individual framenumbers 3305 of the Reference Stream are listed down the left side ofthe Query Stream Recognition. Window 3300. Adjacent to the column offrame numbers is a column of Media Stream Titles 3310, indicating theindividual Media Streams composing the Reference Stream, the data forwhich has been stored in the Reference Stream Listing File. On the righthand side of the Query Stream Recognition Window is a Recognition Plotof the recognizability of the individual frames of the Reference Stream3315. The length of each individual spike is a measure of howdistinguishable a given frame is within the context of all the frames inthe Reference Stream. This recognizability measure is not a probability,but rather a count of the number of times a given frame number appearsin the Reference Bins indexed by the stream of Index Keys computed forthe Query Stream. As such, the Recognition Plot could be converted to aplot of individual frame recognition probabilities by an appropriatescaling of the Recognition axis. It should also be pointed out that theRecognition Plot is a histogram: that is, the frame number axis has beenquantized into multiframe intervals. In this illustration, theRecognition Plot Interval is 25 frames. Therefore, strictly speaking,the plot shows the recognizability of an interval of 25 frames ratherthan the recognizability of individual frames. The Recognition PlotInterval defines the minimum length of the shortest snippet of a QueryMedia Stream which may be usefully recognized: in this case, less thanone second for a 30 frames/second display.

[0531] The Query Stream Recognition Window also identifies the actualand matched Query Streams for purposes of testing the performance of thesystem. The title of the actual Query Stream, if it is known, isdisplayed in the Actual Title Text Box 3330, while the result ofrecognition is displayed in Matched Title Text Box 3335. The actualduration of the Query Stream is displayed as a vertical bar 3320 in thespace between the Titles and the Recognition Plot, the top and bottom ofthe bar indicating the actual starting and stopping frame of thatportion of the Reference Stream which is being played as the query. Ifthe Query Stream is not part of the Reference Stream, this vertical baris not displayed. The estimated starting and stopping position of thatportion of the Reference Stream which is matched to the Query Stream isdisplayed as a second vertical bar 3325. As can be seen from the exampleplot, the estimated duration of the matched stream is slightly greaterthan the actual duration of the Query Stream. The methods employed formatching and estimating Query Stream duration from the Recognition Plotare covered in detail in the flow chart of FIGS. 49A and 49B.

[0532] Clicking the Query Button 3340 of the Query Stream RecognitionWindow constructs the Query Stream Tropic Window 3400 illustrated inFIG. 34. The Query Stream Tropic Incidence Diagram 3405 is so namedbecause the Query Stream appears in the diagram as a diagonal linesegment, with the left and right ends of the segment indicating thestart and stop of the matched Query Stream in the Reference Stream. Thediagram also indicates portions of the Reference Stream which onlypartially correlate with the Query Stream. These are seen as shorthorizontal line segments 3410. The longest of these line segments arereadily distinguishable from the Query Tropic 3415 because they lackboth the length and the inclination of the Query Tropic. The inclinationof the Query Tropic of course arises from the fact that the frames ofthe Query Stream are sequentially matched by the frames of theappropriate portion of the Reference Stream. If there is no match of theQuery Stream in the Reference Stream, the Query Tropic is absent fromthe diagram.

[0533] Clicking the Reference Button 3345 of the Query StreamRecognition Window constructs the Reference Stream Window 3500illustrated in FIG. 35. This window contains the plot of the HolotropicStorage Incidence Diagram 3505. This diagram plots the contents of theReference Bins. Together with the Decision Tree, it is these twodatabase entities that actually embody the information to determine ifany sub-sequence of the Reference Stream has been matched by the QueryStream.

[0534] Index Keys are plotted horizontally and Reference Stream framenumbers vertically in the diagram. The illustrated application programemploys 9 statistical measures to characterize the warp grids. Hence,there are 2⁹ Index Keys in the range 0 to 511. In this example there are17704 Reference Stream frames. Thus, on the average, each Index Key isrepeated about 35 times. When Index Key i contains frame j, theincidence diagram places a black dot at column i, row j. The resultantdiagram has an overall random appearance with very little structure,reminiscent of an unreconstructed transmission hologram, hence the nameHolotropic Storage. It is only when the individual columns of thediagram are reordered according to the sequence of Index Keys for theframes of the Query Stream that the Tropic for identifying the QueryStream emerges from the noise.

[0535] Subroutine Flow Charts

[0536] The main subroutine of, the application program is called AutoRunand it is flowcharted in FIG. 36. It consists of an initializing portionwhich is entered when the Visual Key button on the interface window isclicked, a loop that is repeated as long as the Media Player is playing,and a terminating portion. AutoRun makes subroutine calls to the otherprinciple subroutines in the application program.

[0537] Entering AutoRun at 3600, the first action is to determine therunning mode and take the appropriate initializing action. If the userhas selected Learn Mode 3602, then the frame counter. i 3604 begins atthe last recorded frame number+1 and the Reference Stream Statisticsfile is opened for appending the new statistical data 3606. If the userhas selected the Query Mode 3608 then a Query Stream Statistics file iscreated 3610 and the frame counter 3612 is initialized at 0. If the userselects Demo Mode no statistical data is collected.

[0538] A new Warp Grid is created 3614 and subsequently initialized 3616as flowcharted in FIG. 37. The main loop 3618 of AutoRun is repeatedwhile the Media Stream is playing. First, the frame counter i isincremented 3620. Next, the statistics of the warp grid are computed3622 as flowcharted in FIG. 40, and, if the Demo Mode has been selected,the statistics are plotted 3624 on the Statistics Meter on the userinterface window. If Demo Mode has not been selected then these samestatistics are written to the Reference Stream Statistics file or theQuery Stream Statistics file, whichever is appropriate 3626.

[0539] Each pass through the loop, the warp grid is initialized andadapted a fixed number of iterations (NumIterations) sufficient for itto reach a near equilibrium condition 3630. Each adaptation iterationoccurs in two steps, sampling the Media Player window 3210 at the warpgrid points 3632 and adapting the warp grid using the sampled levels3634. The subroutines for these two steps are flowcharted in FIGS. 38and 39 respectively. Finally, if Demo Mode has been selected 3635, theadapted warp grid is plotted in the Warp Grid Picture 3636 (3225 of theVisual Key Window 3200).

[0540] When the Media Stream is no longer playing, the loop 3618 isexited and the file for receiving the statistical data is closed 3638.If the Learn Mode is selected 3640 then the Reference Stream Last Frameis set to the frame counter i 3642 and the application program calls theLearn subroutine flowcharted in FIG. 41 to operate on the ReferenceStream Statistics file 3644. If the Query Mode is selected 3646 then theRecognize subroutine flowcharted in FIG. 46 is called to operate on theQuery Stream Statistics file 3648.

[0541] Subroutine Initialize WarpGrid 3700 flowcharted in FIG. 37establishes the points of the warp grid in their initial positions PtsUand PtsV within the U,V space of the warp grid. The number of points tobe initialized in the U and V directions are VCnt and VCnt respectivelywhich are entered as arguments to the subroutine. The variables SpaceUand SpaceV 3702 determine the spacing of the initial warp grid pointplacements, which are individually placed within the nested iterations3704 and 3706 according to the linear calculations of 3708. Thesestarting positions of the warp grid points are held in array variablesStartPtsU and StartPtsV 3710 as well as array variables for maintainingthe current positions of these points PtsU and PtsV 3712.

[0542] Subroutine SampleWarpGrid 3800 flowcharted in FIG. 38 obtains thepixel brightness levels of the Media Player Window at the Warp Gridpoint locations U and V. Iterations 3802 and 3804 index through the WarpGrid points 3806. If a point's value U is greater than 1 so that itfalls outside the Warp Grid Bounding Rectangle, then it is decrementedby 2 so that it samples inside the rectangle, but on the opposite sideof the rectangle 3808. Likewise, if a point's value U is less than −1 sothat it falls outside the Warp Grid Bounding Rectangle, then it isincremented by 2 so that it samples inside the rectangle, but on theopposite side 3810. Similarly, Warp Grid point values V are wrapped ifthey fall outside the bounding rectangle 3812, 3814.

[0543] The subroutine ConvertUVtoSourceXY 3816 establishes the mappingfrom the U,V space of the Warp Grid to the x,y space of the Media PlayerWindow. Finally, the brightness of a pixel at x,y in the Media PlayerWindow is sampled by the SourceSample subroutine 3818 and stored in thearray variable PtsC.

[0544] Subroutine AdaptWarpGrid 3900 flowcharted in FIG. 39 adapts thecurrent warp grid a single iteration at the specified Warp Rate. A pairof nested iterations 3902, 3904 treats the current warp grid point m,nindividually. Each individual warp grid point is connected via itsconnectivity pattern to surrounding points of the warp grid. Here, theconnectivity pattern is called “neighborhood connectivity” because theconnected points are all the immediate neighbors in the initialized warpgrid. The nested iterations on i and i 3908, 3910 iterates over thepoints of the neighborhood of grid point m,n. The width of theseiterations is determined by the neighborhood radius NbrRadius.

[0545] The adaptation method employed here uses a center-of-gravitycalculation on the points of an adapted warp grid. That is, the pointsof the warp grid may be significantly displaced from their startingpositions. The center-of-gravity is computed over the points in theneighborhood connectivity pattern. Each point is given a weight equal tothe brightness of the corresponding pixel in the Media Player Window.The “lever arm” of the center-of-gravity calculation is the currentdistance between the given warp grid and its neighbor. The variablesnecessary for the center-of-gravity calculation are initialized 3906 foreach point in the warp grid.

[0546] The points of the warp grid are toroidally connected, hence themodular calculation 3912 for modU and modV which are restricted to theranges 0 to UCnt−1 and 0 to VCnt−1 respectively.

[0547] Recall that the bitmap in the Media Player Window is also treatedas being toroidally wrapped, hence the bitmap can be viewed as aninfinite repeating patchwork. That is why the subroutine SampleWarpGridapplies offsets of +2 and −2 whenever U or V goes outside their boundingranges −1 to +1. But the center-of-gravity calculation does not give thepixel sampled from the opposite edge of the bitmap a lever arm thatlong; rather, the lever arm of the calculation is taken from the point'sunwrapped position even if it falls outside the bounding rectangle.

[0548] The calculations for testing m+i and applying the appropriatevalue to offsetU 3914, 3916 and 3918 ensures that the neighborhoodgeometries will be contiguous in the U direction as discussed in thepreceding paragraph. Similarly testing n+j and applying the appropriatevalue to offsetV 3920, 3922 and 3924 ensures that the neighborhoodgeometries will be contiguous in the V direction.

[0549] Summing the sampled levels of each warp grid point in theneighborhood yields the zero^(th) moment of the center-of-gravitycalculation 3926 for the point m,n. The first moment is of course thesum of the sampled levels weighted by the distance of the samplingpoint. These distances are taken with respect to the U and V coordinatesof the grid point m,n. The offsetU previously calculated is applied tothe coordinate ptsU of the sampling point in summing the First Momentfor U 3928. Similarly, the offsetV previously calculated is applied tothe coordinate ptsV of the sampling point in summing the First Momentfor V 3930.

[0550] At the conclusion of the nested iterations over the neighborhoodof grid point m,n, the first moment is tested for a zero value 3936 andif true the location grid point m,n remains unchanged 3938, otherwisethe temporary array variable NewPtsU will be calculated by offsettingthe U coordinate of grid point m,n by an amount proportional to thecenter-of-gravity's U coordinate, namely the Warp Rate 3940. Similarly,NewPtsV will be calculated by offsetting the V coordinate of grid pointm,n by an amount proportional to the center-of-gravity's V coordinate,namely the Warp Rate 3942.

[0551] Only after the nested iterations on n and m have completed doesthe actual changing of the warp grid point,array variables PtsU and PtsVoccur. Nested loops on n and m 3946 and 3948 iterate over all gridpoints replacing PtsU and PtsV with the temporary array variablesNewPtsU and NewPtsV respectively 3950.

[0552] Subroutine ComputeStatistics 4000 flowcharted in FIG. 40generates a set of nine statistics on the points of the fully adaptedwarp grid. The statistics are the average values over all the warp gridpoints of the quantities U^(i)*V^(j) (U coordinate raised to the i^(th)power times V coordinate raised to the j^(th) power). These statisticsare the higher moments and cross-moments of the warp grid patterns. Thenine statistics that have been chosen for this application program arethose forms for which i+j >0 and i+j <n where n=4. In general, n couldtake on other values, the higher values of n leading to arithmeticallymore statistics with a geometric rise in the number of possible indexkeys. Or other sets of statistics could be defined.

[0553] Nested iterations on n and m 4002, 4004 individualize warp gridpoint coordinate pairs U, V 4006. Offsets are applied if necessary towrap point U,V back into the bounding rectangle at 4008, 4010, 4012 and4014. These four steps may be omitted. Next the partial sums of the ninestatistics are obtained in 4016.

[0554] Upon exiting the nested iterations on the points of the warpgrid, the nine statistics are computed as the average values of the ninepartial sums 4018.

[0555] Subroutine Learn 4100 flowcharted in FIG. 41 consists of thethree major steps for converting the Reference Stream Statistics fileinto Decision Tree data and Reference Bins data. First the ReferenceStream data is read from the specified file 4102. The subroutineComputeDecisionTree 4104 creates the Decision Tree database asflowcharted in FIG. 42. The subroutine ComputeDecisionTree alsoconstructs index keys for each frame of statistical data in theReference Stream Statistics file. Finally, the subroutineStuffReferenceBins 4106 creates the Reference Bins database and isillustrated in FIG. 44.

[0556] Subroutine ComputeDecisionTree 4200 flowcharted in FIG. 42constructs the Decision Tree and Index Keys for the data in theReference Stream Statistics file which is specified in the datafile,argument in the subroutine call. The general principle of the DecisionTree construction is to treat the nine statistical measuressequentially, starting with the first statistical measure. To begin, anIndex Key of 0 is assigned to each frame of the datafile. Next, themedian value of the first statistic is determined over all the frames ofthe datafile. The first statistic of each individual frame of thedatafile is then compared to this first median value. If the firststatistic is greater than this first median value, then the value 1 isadded to the corresponding Index Key for the frame. This first operationon the first statistical measure partitions the datafile into thoseframes with an Index Key of 0 and those frames with an Index Key of 1.

[0557] Next, we consider the second statistical measure. Two additionalstatistical medians are computed for the second statistic over theentire datafile, a second median for those frames whose Index Key is 0and a third median for those frames whose Index Key is 1. The secondstatistical measure of those frames whose Index Key is 0 is compared tothis second median and if the second statistic of the frame exceeds thissecond median, then the Index Key of the frame is incremented by 2.Similarly, the second statistical measure of those frames whose IndexKey is 1 is compared to the third median and if the second statistic ofthe frame exceeds this third median, then the Index Key of the frame isincremented by 2. This second operation on the second statisticalmeasure partitions the datafile into four groups specified by theirIndex Keys, at this operation having possible values of 0, 1, 2 and 3.

[0558] The process continues for the remaining statistical measures. Ateach successive stage of the process, the number of new statisticalmedians needed to be calculated is doubled. Similarly, the number ofpossible Index Key values is doubled as well. At the completion of theninth statistical measure, the number of statistical medians calculatedin total will be 511, or 2⁹−1, while the number of possible Index Keyswill be 512, or 2⁹.

[0559] Referring now to FIG. 42, the subroutine begins with aninitializing iteration over all of the frames of the datafile 4202setting a corresponding Index Key to zero 4204. Next is the actualiteration over the nine individual statistical measures indexed by m4206. As can be derived from the previous paragraphs, the m^(th)statistical measure requires that 2^((m−1)) statistical medians becalculated, hence a further nested iteration on an integer k runs overvalues 0 to 2^((m−1))−1 as shown in 4208. Each newly computedstatistical median 4210 is stored in a two-dimensional array variableSlices(m,k). The computation of the statistical median is illustrated inthe flowchart of FIG. 43.

[0560] After all the statistical medians for the m^(th) statisticalmeasure are calculated, the datafile can be iterated frame-by-frame 4212and the Index Keys of the individual frames can be incremented or not bya value of 2^((m−1)) depending on whether the m^(th) statistic is equalto or greater than the appropriate statistical median given the frame'spresent Index Key 4214. The construction of the Decision Tree iscomplete 4216 when all the statistical measures have been dealt with inthis manner.

[0561] Because each branch of the Decision Tree is constructed bypartitioning an array of statistical measures approximately in half bycomparing it to the median value of the array, the resultant tree may beconsidered to be balanced in that we expect each of the possible IndexKey values for the frames of the datafile can be expected to berepresented about an equal number of times. This is exactly what isobserved on the actual data. As can be observed from the HolotropicStorage Incidence Diagram of FIG. 35, each Index Key column hasapproximately the same number of black dots. This uniform distributionof the Index Keys through the space of possible Index Keys is the highlydesirable result that could not be obtained on picture object streamsusing the methods previously described for individual picture objectsemploying the Squorging algorithm.

[0562] Function StatMedian 4300 which is repeatedly called fromSubroutine ComputeDecisionTree is flowcharted in FIG. 43. The functionaccepts as arguments an Index Key value (indexKey) and the index of thestatistical measure presently under consideration m. After firstinitializing a temporary variable count as zero 4302, the individualframes of the datafile Reference Streams Statistics are iterated 4304.The Index Key corresponding to each frame is compared to the argumentindexKey 4306 and if they match, count is incremented by one 4308, andthe m^(th) statistical measure for the i^(th) frame of the datafile isadded to temporary array variable array 4310. Recall that on callingStatMedian from ComputeDecisionTree, the Index Keys are being “built”one statistical measure at a time. Therefore, the array IndexKeyscontains these “partially built” keys.

[0563] What is needed is the statistical median of the contents oftemporary array variable array. This is obtained by first sorting arrayusing the QuickSort method 4312. Since QuickSort returns array sorted innumerically ascending order, the Statistical Median can be directlydetermined by drilling down halfway through the sorted list to obtainthe value at the mid-position in the sorted list 4314.

[0564] The final step of the Learning process is the subroutineStuffReferenceBins 4400 flowcharted in FIG. 44. The integer variable iis iterated over all the frames in the datafile Reference StreamStatistics 4402. Each frame of the datafile has an associated completedIndex Key, the Learn process having just computed the Decision Tree andthe Index Keys in the process. The two-dimensional array variable Binsis initialized to 512 individual bins corresponding to the 512 possibleIndex Key values arising from the 9 statistical measures. The size ofeach bin is fixed in this example at a constant BINMAXCOUNT, althoughthe bin storage does not have to be of fixed size and could beredimensioned as desired. In this example application program, the bincounts and bin contents are upgraded only if the bin count for the binnumber indexed by the Index Key of the i^(th) frame of the datafile isless than BINCOUNTMAX 4404. If so, the appropriate bin count isincremented by one 4406 and the frame number i is added to theappropriate bin 4408.

[0565] This concludes the discussion of the subroutines required for theLearn Mode of the example application program. We continue with adiscussion of the subroutines required for the Query Mode of operationof the application program.

[0566] When the AutoRun subroutine is in its main loop in Query Mode,the statistics of the Visual Keys of each frame are written to thedatafile QueryStreamStatistics. When the Query Stream ends or ismanually shut off, this file is accessed by the Recognize subroutine4500 flowcharted in FIG. 45. The first step of the recognition processis to read the datafile statistics in the ReadQueryDataFile subroutine4502. Next, the Index Keys of the Query Stream are computed from theQuery Stream Statistics and the Decision Tree obtained in Learn Mode inthe ComputeIndexKeys subroutine 4504 and flowcharted in FIG. 46. Next,the bins of the Reference Bins collected during Learn Mode are reorderedaccording to the Query Stream Index Keys, which creates a Query Tropic.The Subroutine ComputeQueryTropic 4506 is flowcharted in FIG. 47. Next,the Query Tropic is projected onto its Reference Stream Frame Numberdimension, resulting in a histogram which plots the frequency ofoccurrence of Frame Numbers in the Query Tropic. The subroutineComputeRecognitionHistogram 4508 is flowcharted in FIG. 48. Next is thePlotting of the Recognition Histogram 4510. Finally, the subroutineDisplayRecognitionResults 4512, flowcharted in FIGS. 49a and 49 b,analyses the Recognition Histogram for its peak value and the width ofthat peak to make a positive match or to refrain from matching.

[0567] Referring now to the subroutine ComputeIndexKeys 4600 which isflowcharted in FIG. 46, a first iteration 4602 on i over the frames ofthe datafile QueryStreamnStatistics sets each Index Key for each frameto zero 4604. A next iteration on m 4606 considers the nine statisticalmeasures sequentially over all frames in the datafile, which is a nestediteration over i for the length of the datafile 4608. At each iterationof m, all of the Index Keys for the datafile frames are recomputed. Therecomputation consists of either adding 2^((m−1)) to the index key ornot. The decision is based on comparing the m^(th) statistic of thei^(th) frame to the Decision Tree value stored at Slices(m,IndexKeys[i]). If Stats(m,i) is greater than or equal to the DecisionTree Slicing value 4610, then the Index Key for the frame is incrementedby 2^((m−1)) 4612; otherwise, the value Index Key for the frame isunchanged.

[0568]FIG. 47 flowcharts the ComputeQueryTropic subroutine 4700. Afterthe Index Keys for all the QueryStreamStatistics datafile frames havebeen computed, the contents of the Reference Bins obtained in Learn Modeare selected and ordered by the sequence of Index Keys for the QueryStream. A first iteration on i 4702 considers each frame over the lengthof the Query Stream datafile. A second nested iteration over k 4704 runsthrough the contents of the bin indexed by i^(th) Index Key. Thetwo-dimensional array variable Tropic is indexed on i and k and collectsthe frame numbers stored in the designated bins 4706:

[0569] The subroutine ComputeRecognitionHistogram 4800 is flowcharted inFIG. 48. A first iteration on i over the length of theQueryStreamStatistics datafile considers each frame of the querysequentially 4802. A second nested iteration on k selects each framenumber in the Reference Bin identified by the Index Key of frame i ofthe datafile 4804. These frame numbers have already been copied into theQuery Tropic in the previously called subroutine ComputeQueryTropic,therefore a one-dimensional array variable histogram is incremented foreach instance of a frame number in Tropic(i,k) which falls into thepreselected histogram interval HISTINTERVAL 4806. In this exampleapplication program, HISTINTERVAL has been chosen to be 25.

[0570] The last subroutine to be examined and flowcharted in FIGS. 49aand 49 b is DisplayRecognitionResults 4900 (note: this subroutine issplit into two figures solely for space considerations). The function ofthis subroutine is to first determine the maximum peak of therecognition histogram, then to determine the width of the peak and thearea under the peak, then to compare the width of the peak against thelength of the Query Stream as determined by the Media Player. Thesubroutine then tests to see that the area under the peak is greaterthan a selected percentage of the entire histogram area, that the peakheight is greater than a preselected minimum, and that the estimatedwidth of the peak is at least a selected percentage of the actual QueryStream play time. If all these conditions are met, then the subroutineidentifies the Query Stream from the position of its peak. Of course,this is an example of how the Query Tropic could be analyzed foridentifying the Query Stream. There are countless other ways that thisanalysis could be carried out, and one skilled in the art could no doubtsupply an endless stream of alternative analytical techniques, all ofwhich accomplish essentially the same result.

[0571] Referring to FIG. 49, after the variable maxHistValue isinitialized to zero 4902, an iteration on j over all the intervals ofthe recognition histogram 4904 computes the histogram area 4906 andtests for the maximum value 4908 which is stored in maxHistValue 4910with its interval being noted at maxHistInterval 4912.

[0572] The upper edge of this histogram peak is determined by the nextiteration on j which begins at the center of the peak and iteratestowards the upper bound of the histogram 4914. When the histogram valuefalls below a selected fraction of the histogram peak value 4916, herechosen to be 0.05, the variable j2 marks the interval of the upper edge4918, and an estimate of the Query Stream Stop Frame is calculated 4920before the iteration is prematurely terminated 4922.

[0573] Likewise, the lower edge of this histogram peak is determined bythe next iteration on j which begins at the center of the peak anditerates towards the lower bound of the histogram 4924. When thehistogram value falls below a selected fraction of the histogram peakvalue 4926, here chosen to be 0.05, the variable j1 marks the intervalof the lower edge 4928, and an estimate of the Query Stream Start Frameis calculated 4930 before the iteration is prematurely terminated 4932.

[0574] Following the initialization of the variable peakArea to zero4933, the intervals of the histogram from j1 to j2 are iterated 4934 todetermine the area under the peak 4936, which is used to calculate thepeakAreaRatio 4938, i.e., the percentage of the peak area to the entirehistogram area.

[0575] The actual duration of the Query Stream can be obtained directlyfrom the Media Player as the difference between the start and stop timesof the played stream 4942. The estimated start and stop frames from thehistogram peak analysis are converted to actual start and stop QueryStream times by interpolating the catalog entries in the ReferenceStream Listing file 4943, thus yielding an estimated Query Streamduration from the peak analysis 4944. The duration ratio is then justthe ratio of the estimated to the actual duration 4946.

[0576] The test for determining whether the Query Stream is matched bysome portion of the Reference Stream is to compare the peakAreaRatio,the maxHistValue and the durationRatio to acceptable minimums 4950, andif they are all greater than their acceptable minimums, then to plot theindicator for the estimated Query Stream play interval 4952 on the QueryStream Recognition Window, shown as 3325 on FIG. 33, To print the word“MATCHED” in this same window 4954, and to obtain the title for thematched Query Stream from the frame number of the peak maximum from theReference Stream Listing datafile 4956. Otherwise, if all the acceptableminimums are not exceeded 4958, then the result “NO MATCH” is printed4960 with the matched title “Unmatched” 4962.

[0577] Finally, for comparison and testing, the actual title of theQuery Stream is displayed 4964, if known, along with the play interval4966 of the actual played stream which is plotted as 3320 on theRecognition Window of FIG. 33.

[0578] Assigning Keywords to Images

[0579] Throughout the discussions of the present invention it has beenrepeatedly stated that the purpose of the invention was to enable thesearching of media databases with query media. Here, the term media canmean still pictures, streaming pictures, or recorded sound. Although itwas stated that the media query search could be augmented by naturallanguage descriptors such as keywords or phrases, it has been repeatedlyemphasized that the strength of the present invention is primarily itsability to perform a search without keywords or phrases, i.e., withpictures alone. But having asserted this premise, it may be useful toexamine further the relationship between pictures and their keywords inorder to clarify further possible applications of the methods presentedherein.

[0580] Most media database searching is performed with Keywords. Thus,if a person desires a picture of Humphrey Bogart, he enters the words“Humphrey Bogart” into a media search engine and he is presented a setof links to media which have previously been tagged with the keywordsHumphrey Bogart. Now, this process of tagging or associating keywordswith media such that they may be searched by conventional search enginesis an area of considerable interest. Largely, the process of tagging isan intensely manual one, depending upon human perception to assign tagswhich correspond to the pictorial or aural content of the media. It isin response to this problem that a suggestion is put forward here thatthe methods of the present invention may be employed autonomously orwith human assistance to greatly ease the burden of assigning keywordsto media.

[0581] When a picture appears on the Internet, it does not usuallyappear in isolation of textual material. First of all, the picture,being a file, has a file name, which is its first textual asset. Thepicture is usually the target of a hyperlink, and that hyperlink isanother textual asset when it is hyperlinked text, or, if the hyperlinkis another image, then the filename of that image is a textual asset.The title of the page on which a picture resides is a textual asset, asis the URL of the page, metatags on the page (which may intentionallycontain keywords), and all of the text on the same page as the picture.Text which appears in the immediate vicinity of a picture is potentiallya more valuable textual asset than words on the page, words that appearin the same frame or table as the picture again being potentially morevaluable. In short, a piece of media usually resides in an environmentrich in textual material, and within this wealth of textual material maylie effective keywords for tagging the picture. The problem is whichwords derivable from the set of all textual assets are good keywords.

[0582] It is in answering the above question that the methods of picturesearching previously described may be employed. Suppose we have anInternet image and all of its associated textual assets that we haveautomatically captured from its web page environment. Now we perform aVisual Key search of the Internet using the methods described herein byfirst making the picture in hand a reference picture and extracting andrecording its Visual Key. Now we automatically crawl the internet usinga software robot or spider looking for files of the usual types forimages, i.e., jpeg's or gif's, and each one that we find we generate itsVisual Key and match it to our reference image. Each time that we find asufficient match to the reference image we collect all of the textualassets of the matched image. When a sufficient number of matches to theinitial picture have been found, all of the textual assets, for thematching pictures can automatically be statistically analyzed fortefrequency of occurrence of individual words. Common non-descriptivewords can be thrown out immediately, while at the same time the wordscontained in the image file names can be given a higher weighting in theprocess. Although it is not the intention of this disclosure to describein detail how all of the textual materials found associated with matchedpictures may be analyzed, it should be clear to anyone skilled in theart that words which make multiple appearances in textual assets whichare a priori given high weightings make good candidates for keywords,while words that may often appear in lower weighted assets may also makegood keywords.

[0583] Although the above process has been presented in terms ofcrawling the Internet in search of matches to a single picture, thatprocess would be extremely inefficient. Rather, an entire largecollection of pictures in a Visual Key Database could be searchedsimultaneously using the methods of the present invention. Each imagewhich is located and downloaded from the internet would be matchedagainst the entire database. When a match is found, the textual assetsof the found match would be added to the textual assets of all thepictures that have been previously matched to that same picture. When apicture is downloaded that does not sufficiently match any of thepictures in the database, it may be added to the database with itsassociated textual assets, or, if the database is one of a fixed numberof pictures, it may be discarded.

[0584] Clearly, the efficacy of this approach depends on a given picturebeing found a multiple of times on the Internet in association withdifferent textual assets. This is probably a good assumption for thosepictures which would most frequently be searched for by keywords onmedia search engines. The more frequently a particular image is searchedfor, the more popular is that image, hence the more likely it is toappear multiple times in different textual environments.

[0585] Finally, although it has not been explicitly pointed out inprevious discussions, the methods of Holotropic Stream Recognition mightquite profitably be employed in the above process of automaticallyassigning keywords to individual Internet images. It should beappreciated that the Holotropic methods do work on streaming mediaprecisely because the individual adjacent streamed images are verysimilar. Thus, when the individual images of a stream are, converted toIndex Keys by the Decision Tree and stuffed into individual bins, eachbin being indexed by a different Index Key, it is not surprising to findvery similar images from the same portion of a stream in a given bin.This in fact is the basis of the mechanism which is employed to createthe Tropic of a given Query Stream, thus leading to its immediateidentification.

[0586] Now suppose that the individual images in a sequence of images donot correspond to the frames of a movie, but rather are the individualimages collected during the crawling of the Internet for any image. Wecan still employ the Holotropic steps of constructing a Decision Treeand sorting the individual images into bins according to their IndexKeys, and it should come as no surprise that the individual images ineach bin would be quite similar. The longer the Index Key, the more binsthere would be, the more similar the individual images in each bin wouldbe. If, furthermore, we had collected the textual assets of all theimages to be sorted by Holotropy in this manner, then all of the textualassets of the images in a given bin could be analyzed for multiplyrepeated words and these words could then be used as keyword tags forthe individual images in a given bin.

[0587] By the methods of keyword preparation described above, apreliminary keyword searchable database of media could be prepared. Thispreliminary database could then be further refined by an iterativemethod which may employ a conventional search engine. If a set ofkeywords is extracted from the textual assets of similar or matchedpictures in our preliminary database and if these same keywords derivedfor similar or matching pictures are then entered into a conventionaltext based search engine, then those web pages returned by theconventional search engine are more likely to contain images which matchor are similar to the images in our preliminary database than pageswhich are randomly searched for images. When a picture match is observedon a web page listed by a conventional search engine, the process addsthe textual assets of the matched picture to the previously accumulatedtextual assets in the preliminary database as well as the matching orsimilar picture.

[0588] At each step of the above described iterative method, thedatabase being constructed of images and their associated keywordsbecomes more refined because of the addition of more textual assetsdescribing similar or matched pictures. At each stage of refinement ofthe database and its automatically derived keywords, the step of findingadditional pages to search for additional matches using a conventionalsearch engine becomes more refined, and the probability of findingrelevant pages with matching pictures and valuable textual assetsincreases. Thus the entire process of automatically deriving keywordsfor media can be thought of as a bootstrap process, that is, a processwhich is capable of perpetuating and refining itself through theiterative application of its basic functional operation to the currentmaterials in the database.

[0589] Reticle Projection

[0590] This method employs pseudo-random sequences to sample frames oftransformed media. These pseudo-random sequences operate on thetransformed data in a manner analogous to the optical encoding ofprojected images through a coded reticle, hence we refer to these stepsof the following technique as the reticle projection.

[0591] The reticle projection step of the process will be described inmuch greater detail in the next sections, along with the subsequentsteps of thresholdong, sampling, shuffling, and segmenting. These stepscompose a process of “image combustion” where most of the information inan image is “burned” away and that information which remains is splitinto k individual channels of n bits each. Because these steps andpossibly the initial transformation step are so destructive of originalimage content, those bits which remain, although appearing so much likenoise, actually encode the most primitive image structures of thetransformed image input. Hence these remaining bits are descriptive notonly of the input image, but to all images that share these primitivefeatures.

[0592] An important advantage of these techniques is that although weknow that it finds similarity through the process of comparing commonimage features, we have no idea what those image features represent nordo we care. We only know that the system is comparing features byobserving its behavior in identifying groups of similar images out of adatabase of images.

[0593] Digital Audio

[0594] Building upon the methods described herein and the reticle methoddiscussed above, an application for the archiving and retrieval ofdigital audio objects, using only the content of those objects, has beendeveloped. To date, most of the practical applications of thistechnology have been concerned with vocal and instrumental music;however, because the application is strictly content-based, it cansuccessfully be applied to any digital audio data.

[0595] In order to build a database of digital audio objects, aspecific, proprietary algorithm is to convert such audio objects intodigital keys. An audio object is broken up into an overlapping temporalsequence of intervals. Each of those intervals is quite analogous to asequence of digital video frames, and essentially the same Holotropicstream recognition process which has been described in that context isused to find the best match between query object and database object.However, the process of generating the decision tree from whichHolotropic information flows is specific to the digital audioapplication.

[0596] As noted above, an audio object is broken up into an overlappingtemporal sequence of intervals. Overlaps from 50 to 90 percentultimately offer good performance. In general, less overlap results iongreater processing speed, while more overlap results in more accurateidentification. To date, the audio stream has been broken up intointervals against an arbitrary time reference. We intend to try todetermine the placement of the intervals based upon informationcontained in the music itself. If we are successful in this endeavor,enhanced performance should result.

[0597] Each interval of the overlapping temporal sequence is transformedby the fast Fourier transform (FFT) into a spectrum of resultingmagnitude vs. frequency. A frequency cutoff of 5.5 kHz has been seems towork well, and has become something of a standard.

[0598] Because of the nature of music, the magnitude associated with onefrequency may typically be very much larger than the magnitudeassociated with another frequency and, in subsequent signal processing;may have an undue influence upon the final result. Thus, a normalizingfunction is applied to the power spectrum so that the resultingnormalized power spectrum will be fairly uniform over the frequencyrange of interest. The normalizing function has been obtained byaveraging the power spectra obtained from a large body of music content.We obtained our standard normalizing function by averaging the powerspectra of a bout 20 hours worth of music.

[0599] The normalized power spectrum FFT is sampled uniformly to producea vector containing these values in a vector of length 1023. Thistransform data vector is projected through a 1023 element reticle togenerate the projection. The threshold projection is then calculated.

[0600] A fixed process is used to select 90 binary values out of thethreshold projection. A selection process which selects the 90 values asthe approximate intersection of 91 approximately-equal intervals hasbeen shown to work well. These 90 values are then scrambled by a fixedpseudo-random algorithm. The result is the gene.

[0601] The gene is now divided into 10 codons of 9 bits each. Thedecision tree is built out of these codons, and the tools are in placeto use Holotropic stream recognition for the matching of query, objectsand database objects.

[0602] Performance of the system described generically above has beenexemplary. Using 2-second music intervals as query objects on a,database derived from over 20 hours of music, the system has madematched query object and database object without error.

[0603] Digital Text

[0604] The incorporation of text stream recognition into the space ofprocessed media inputs permits holotropic searches for textual content.For example the lines:

[0605] All the world's a kennel,

[0606] And all then dogs and cats merely pets.

[0607] They have their exits and their entrances,

[0608] And one owner in his time opens many doors,

[0609] His acts being twentyfour hours.

[0610] may or may not be familiar to the reader of this document, butthey are readily recognizable as the Shakespearean lines

[0611] All the world's a stage,

[0612] And all the men and women merely players:

[0613] They have their exits and their entrances;

[0614] And one man in his time plays many parts,

[0615] His acts being seven ages.

[0616] to those having a general familiarity with Shakespeare's plays.

[0617] When a conventional search engine is asked to search all ofShakespeare for the fictitious quote it of course responds that itcannot find a match. When the same quote is entered into a Shakespeareantrained media content search employing the methods described in thisdocument, it correctly states that there is no identical match but thatthe best existing match in all of Shakespeare is the actualShakespearean quote above. And the system will continue to make thiscorrect identification as the quote is further maligned withmisspellings, deletions, insertions, or rearrangements.

[0618] Media Content Indexing System Description

[0619] Reference is now made to FIG. 50 which depicts a media contentindexing application according to the invention. Input signal 5000 maybe a portion of an audio waveform, a digital image, a frame of digitalvideo or a phrase of text, although the parameters illustrated in FIG.50 represent nominal parameters for the processing of audio waveformsrepresenting high fidelity music. In the case of audio, the inputwaveform is preferably segmented into frames, a typical frame being 200milliseconds. Audio frames can overlap by 50 percent; therefore, audioframes can be acquired at the rate of 10 per second. In this example, aframe is digitized to 4096 integer values, sufficient to sample up tomidrange audio frequencies.

[0620] In the next step 5001, the input data frame is transformed intoan auxiliary digital construct. In the case of audio illustrated here,that auxiliary construct is the Normalized Power Spectrum of theDiscrete Fourier Transform (DFT) which is well known and described innumerous references on signal processing. The DFT of the audio waveformhas both real and imaginary parts, and represents both the amplitude andthe phase of the frequency components of the waveform. The PowerSpectrum is the magnitude of each frequency component, disregarding itsphase, computed as the square root of the sum of the squares of the realand imaginary components of the DFT. In the case illustrated in FIG. 50,the transform data is represented by 1023 floating point numbers whichcorrespond to 1023 frequencies in the power spectrum. Furthermore, thePower Spectrum values are normalized over the entire set of audio inputframes entered into the Digital Key database. Normalization consists ofadjusting the individual frequency components of the DFT magnitudes byscaling each frequency component by the inverse of the average DFTmagnitude of the frequency component for all of the input frames of allthe input samples.

[0621] In the case of a digital image input at 5000, the transform at5001 may take four or more different forms. The first form is simply theisomorphic transform, meaning the transformed image pixel value is afunction of the value of the corresponding pixel in the input image.Secondly, the transformed image may be a warp transform of the inputimage. The warp transform has been extensively discussed earlier in thisapplication. Thirdly, the image transform may be a normalizedtwo-dimensional DFT magnitude, directly analogous to the one-dimensionalDFT discussed in the previous paragraph for audio input, and finally,the transformed image may be a histogram of the relative frequency ofoccurrence of identical m-by-n sub-images of the input image. The m-by-nsub-images are here referred to as neighborhood sub-images. For example,if the input image is binary and the neighborhood is 3-by-4, then thereare 4096 possible configurations of neighborhood sub-images (2 raised tothe 3×4 power). A binary digital image of 512 by 512 pixels wouldcontain 510×509 or 259,590 discrete sub-images, each sub-image being oneof the 4096 possible sub-images. Thus the transformed data at 5001 wouldrepresent the normalized frequency of occurrence of each of the 4096possible 3×4 binary sub-images. Each frequency of occurrence may benormalized by scaling by the inverse of the expected value of eachsub-image frequency computed over all sub-images of all input images tothe Digital Key database. Other methods of scaling are discussed later.

[0622] An image input at 5000 represents a single input frame, whereasif the input were a digitized video then it would be represented by aseries of frames. Each frame of video is entered into the database inthe same manner as a frame representing a still image.

[0623] The input at 5000 may also be a string of text or otheralphanumeric symbols, represented by their ASCII values or any otherrecognized character-to-byte or character-to-binary word mapping. In astraightforward variation, the input can be a string of words, each wordof the recognized set being converted to an integer representing itsindex in a word dictionary. The input strings can be of any length, butsimilar to the audio case, the input string is preferably subdividedinto overlapping frames, each frame representing a given number of wordsor characters of the input string. However, it is also possible to havetextual inputs of a single frame.

[0624] Transformation of input text into the auxiliary construct 500lmay be appreciated by its similarities to the neighborhood sub imagetransformation of still images. For example, an input frame of 512characters may be considered to be a sequence of 511 overlapping2-tuples, each 2-tuple being 2 successive characters. Likewise, theinput frame of 512 characters may be viewed as 512−n+1 successivelyoverlapping n-tuples, each n-tuple being a succession of n characters.This example corresponds to an n-by-n neighborhood sub-image in the caseof still images. For an alphabet of m possible characters, there arem^(n) power different n-tuples. For example, if we restrict ourselves toa lower case alphabet of 26 characters plus a space character, thenthere are 27^(n) possible n-tuples. Once again, the transformed textdata may be in the form of a histogram of the normalized frequency ofoccurrence of each possible n-tuple, where normalization is accomplishedby scaling each histogram component by the inverse of the relativefrequency of each n-tuple in the entire data set of input framesrepresented in the database. Alternatively, each n-tuple frequency maybe scaled by the negative logarithm to the base 2 of the inversefrequency of occurrence, which weights each histogram component by afactor representing the information content of the n-tuple within thecollection of all n-tuples in the database. Finally, the histogram ofn-tuple frequencies at 5001 may represent multiple values of n, forexample, 2-tuples, 3-tuples, 4-tuple and 5-tuples. In this case, thehistogram may be multi-dimensional or the individual histograms for eachn-tuple may be combined and added together into a single histogram bynormalizing their lengths.

[0625] The next two steps of the input processing of digitized mediasignals perform an additional transformation upon the alreadytransformed auxiliary construct of 5001. This transformation involvesthe projection of the vector representing individual auxiliary constructvalues through a weighting vector 5002 and onto a collecting screen5003, where each element of the full projection on the screen 5003 isthen composed of a unique weighting of all of the elements of theauxiliary construct 5001 vector of digital values. We have referred hereto the vector of weighting elements 5002 as a reticle, owing to itssimilarity to the optical element of the same name employed in opticalprocessing.

[0626] The reticle projection process may be further appreciated byreference to FIG. 51. Two stages of the process of computing the fullprojection are illustrated in FIG. 51. At the first illustrated stage(left), the 4'th element of the full projection is calculated, while atthe next stage (right), the 5'th element is calculated. At the firstillustrated stage of the calculation, all of the elements of thetransformed data of the auxiliary construct 5100 are individuallyweighed by the elements of the reticle 5101, here illustrated by theelements “+” and “−” representing weightings by +1 and −1 respectively.The value of the 4'th element of the full projection 5102 is thencomputed as the linear sum of the individually reticle weightedtransformed input data. At the next illustrated stage of the computation(right), the 5'th element of the full projection 5105 is computed as theindividually reticle weighted elements of the transformed data 5103where the reticle elements 5104 are rotationally shifted by one element.

[0627] Referring back to FIG. 50 and making reference to theillustrative values contained therein, a vector of 1023 floating pointnumbers representing 1023 discrete values of the transformed input frame5001 is weighted by 1023 binary values, these being +1 and −1,represented by a vector of 1023 bits called a reticle 5002, there being1023 possible such weightings each possible weighting being effected bya particular cyclic rotation of the bits of the reticle, and the linearsum of each of these 1023 individual reticle weighting being recorded atthe elements of the full projection 5003, the I'th element of, the fullprojection being computed as the rotation of the reticle by I places.

[0628] An explanation is probably in order concerning the rationale forthe reticle projection steps of the input process. Clearly, the reticleprojection process is a mapping of every element of the transformedinput data onto every element of the full projection. This step isnecessary even though the transformed input data already represents aprocess of weighted integration over the input frame. For example, inthe audio case illustrated in FIG. 50, the DFT transform computes itsresulting audio frequency spectrum on an element-by-element basis by theequivalent of weighting the input frame elements by sine waves ofincrementally increasing frequency. Thus, the energy of each element ofthe input audio waveform may be spread across the spectrum dependingupon the shape of the entire waveform. It necessary to perform thisweighting and integration step for several reasons. The first is thatonce the input frame is transformed into its auxiliary construct, theremaining steps of the process are the same regardless of whether theinput is an audio waveform, a still image, an image sequence, acharacter sequence or a word sequence. Since the auxiliary construct isdifferent for each of these media, and since each media may havemultiple auxiliary constructs, the step of reticle projection provides ahomogenizing of the individual characteristics of a particulartransformation of a particular media. In other words, although thecharacteristics of a particular auxiliary construct of a particularmedia might be recognizable, once the reticle has stage has performedits function, no such recognizable characteristics should exist.

[0629] Another way of phrasing this conclusion has to do withvisualizing the process described here as a construction of a decisiontree as described earlier in this document. The method of reticleprojection is designed to yield balanced decision trees which ultimatelyresult in superior signal-to-noise values for frame or sequencerecognition. Each terminal branch of the decision tree has approximatelythe same number of leaves. The method of Holotropy previously describedherein is optimized for this condition, where the reference scatterdiagram as illustrated in FIG. 35 appears to be a random dot scatter.

[0630] This extinguishing of any remaining pattern in the auxiliaryconstruct dictates the selection of the weighting values of the reticle.The individual weightings by the reticle should appear to be as randomas possible, given the constraint that the reticle projection is not arandom process by virtue of the fact that the pattern of +'s and −'s isthe same for every input frame of every media sequence for any media.Rather, the reticle pattern is a pseudo-random sequence. One such classof pseudo-random sequences are the so-called maximal length shiftregister sequences. Although the methods described herein may make useof other pseudo-random sequences, the discussion from here on will focuson maximal length shift register sequences, so named for the manner inwhich they are generated. For a further discussion of maximal lengthshift register sequences, seehttp://support.xilinx.com/xapp/xapp210.pdf.

[0631] The full projection is represented in the illustrated audio caseof FIG. 50 by 1023 floating point numbers. Each of these 1023 numbers isa pseudo-random combination of +1 and −1 weighted elements. Ideally, theexpected value of a full projection element is 0, the number of +'s and−'s eventually balancing. Thus the step of thresholding each element ofthe full projection 5004 is one of preserving a full bit of informationfor each bit of the 1023 bit thresholded projection 5004.

[0632] It is interesting to note that the computationally intensiveprocess of computing the full projection may be implemented optically.In the optical implementation of the full projection step, all elementsof the full projection vector are calculated in parallel in a singlestep. That compares very favorably to the nested iterations of the fullprojection as computed digitally (see FIG. 54) This may be an importantcomputational alternative when the input to the system are highresolution images, i.e., on the order of 5000-by-5000 pixels, as mightbe required in a secure document identification system. Currently, forthe cases studied in this disclosure, input images have to be closer to100-by-100 in order to sustain near real-time functionality oncontemporary desk top computers.

[0633]FIG. 60 illustrates the optical reticle projection concept in asingle dimension. The image 6001 of FIG. 60 is formed as a slide andilluminated with monochromatic diffuse light 6000. Here, the imagereferred to is the transformed image of the auxiliary constructpreviously discussed. The reticle mask 6002 is a complex spatial filterwhose individual elements weight the transmitted light rays 6003 fromthe display by +1 or −1, the −1 weighting being accomplished by 180degree phase shifting of the ray 6003. The detector for thisone-dimensional case is ideally a linear array 6004. Rays combining on agiven element of the detector array necessarily pass through differentportions of the reticle, rays from adjacent pixels of the image sourcepassing through adjacent pixels of the reticle. Note the the size of theimage, reticle and detector are the same, but the resolution is twicethat of the image or detector.

[0634]FIG. 61 illustrates this basic configuration for two-dimensionalimages and reticles. A monochromatic diffuse light source 6101illuminates an image slide 6102, the transmitted rays 6105 being passedby the reticle 6103 either uneffected or phase shifted 180 degrees sothat unshifted and phase shifted rays destructively combine at thedetector array 6104.

[0635] FIGS. 62A-62E illustrates the specific example of a 7-by-9reticle implemented as an optical reticle mask. The numbers in FIGS.62B-62E represent individual pixels of the reticle, and weighttransmitted light rays by +1 or −1. Here we see that shifts of thereticle position by plus or minus one pixel horizontally effect singlepixel cyclical rotations of the entire reticle code through the 7-by-9reticle block as illustrated by 6201, 6202 and 6203 respectively.Vertical single pixel shifts 6204 effect 7 pixel cyclical shifts withinthe 7-by-9 block as illustrated in 6204.

[0636] Leaving optical computation of the reticle projection, the nextstep of the audio process illustrated in FIG. 50 results in asignificantly reduced bitstring representation of the input frame. Here,the sampled projection 5005 is but 90 bits long, although it might be asshort as 8 bits as was the case of video holotropy discussed earlier, oras long as the full projection, which offers significant recognitionadvantages when the number of input frames is severely limited. Inkeeping with our notion of destroying patterns in the storedrepresentation of the frames, the 90 bits are pseudo-randomly sampledand shuffled from the 1023 bits available in the thresholded fullprojection 5004.

[0637] The sampled and shuffled bits 5004 are partitioned into segmentsof equal length, which may be anywhere from about 8 bits to perhaps 32bits or more, depending upon the anticipated maximum size of thedatabase being accumulated, where the size of the database is measuredas the total number of input frames of media data indexed in thedatabase.

[0638] It is not unreasonable to think of these sampled and shuffledbits 5005 as a gene, since they represent the genotype, i.e., the inputframe, in the database. Extending this analogy to the fixed lengthsegments of the gene, these must be analogous to the codons of fixedlength sequences of amino acids. However, in this case it is the framethat described the gene, and not the other way around. This is anotherway of saying that the frame cannot be reconstructed from the gene,which contradicts the genetic analogy.

[0639] Other names for the gene and codon have previously been used inthis disclosure. The codons are recognizable as the index keys of videoholotropy previously described. The gene corresponds to the previouslydiscussed index key vector. Other useful analogies are fingerprints.They represent a recognizable trace of the entire individual the patternof ridges of moisture left behind a touch. Another analogy is digitalash, the end product of the complete annihilation of pattern andstructure.

[0640] The remainder of the input process is essentially the same as theholotropic processing previously discussed in the processing of videodata. Each of the 10 codons, or index keys in the gene, here illustratedby a 9-bit bitstring, represents the input frame in the database. Theinput frame is identified by its position in the input sequence offrames entering the database. In the audio example of FIG. 50, a 32-bitbinary word counts the input frames. Each of the 10 codons or index keysgenerated, being a 9-bit bitstrings, indexes 512 possible lists foraccumulating frame numbers. Thus, if the input frame was number 32456,then the first codon 5010 in the audio example, specifically 101010101in the audio example, adds the number 32456 to its list index 341,illustrated at 5011 in FIG. 50, the second codon 5012 of FIG. 50, namely000011111 in the audio example, adds the number 32456 to its list index63, and so forth, the frame number 32456 being added once to each of the10 bins 5006, each addition of a frame number to a bin being designatedto the list whose index is specified by the codon bitstring.

[0641] This completes the overview of the process of entering media datainto the database and creating an index of media contents. To summarizethis process, sequentially presented input frames of media are numberedand transformed, first into an auxiliary construct, then into a fullreticle projection, then into a gene sequence of codons by thresholding,sampling, shuffling and partitioning. Associated with each codon in thegene is a bin, much like a filing cabinet. Each bin has a number ofdrawers, equal to the number of possible codon values. When frame numberN is entered into the system, each codon places a card with the number Nat the back of the drawer determined by its value. Quite naturally then,the frame numbers in each drawer are arranged in ascending order.

[0642] Now we will review from holotropy the process of identifying anunknown query frame, using this analogy of filing cabinets and drawers.Query identification makes use of another auxiliary construct, inparticular, a histogram, which, for our purposes here, can be analogedas a row of boxes, each box able to accommodate a pile of frame cardsdrawn from the filing cabinet drawers. The histogram boxes are labeledfrom 1 to M, where M is the total number of frames entered into thesystem.

[0643] As a first example, suppose the media input to our system are alldigitized still images, there being one million such images entered.Since a still image is a single frame and we can't rely on the presenceof additional sequential frames for enhancing identification, we wouldprobably want a long gene of codons, so imagine we have a gene of ahundred 9-bit codons. Then we imagine a hundred filing cabinets, eachfiling cabinet having 512 drawers, and each drawer containingapproximately 1953 cards. The average number of cards per drawer isarrived at by the fact that the total number of cards in each filingcabinets is one million, while the number of drawers per filing cabinetis 512.

[0644] Now we present a query digitized still image to this system. Thequery image generates a gene in precisely the same way as all of thepreviously entered images. This query gene is similarly partitioned intoa hundred codons having 512 possible values each. Suppose that the valueof the first codon is 411. Then we go to drawer number 411 in the firstfiling cabinet and remove all of the approximately two thousand cards inthe drawer. Then we sort the cards into the million labeled histogramboxes, placing the card marked N into the box labeled N. Next, we removethe contents of drawer number K where K is the value of the second codonof the query gene, and likewise we sort this drawer of cards into theboxes. We proceed with each codon until we have emptied and sortedthrough 100 drawers.

[0645] As a final step of the identification process we count the numberof cards in each histogram box. Suppose that histogram box J has themaximum number of cards. Then we identify the query as being most likeimage J of the million input images.

[0646] If the query image is identical to the image J previouslylearned, then the total number of cards in the box with the maximumnumber of cards will be 100, since the query gene will match codon forcodon the gene for the J'th inputted image. But if the query image isnot an exact match of any of the query images, but is still more similarto image J than any other learned image, then the number of cards in theJ'th box will be a maximum but generally will be less than 100, therebeing fewer cards the more dissimilar the query image is from thelearned image J. For if we would examine the query gene for the similarbut not identical to image J query image, we would find that some numberof bits in the gene have been mutated, that is, their binary states havebeep complemented owing to the dissimilarities of the query image andlearned image J. Each of the 100 codons may contain a mutated bit, inwhich case it will select a different drawer from its filing cabinetthan it did with the codon generated from learned image J. This “wrong”drawer will similarly contain about two thousand cards, but they willarrange themselves totally randomly amongst the million boxes. Now ifmutations occur in half the query gene's codons, then the maximum boxwill have 50 cards, while the average number of cards in all the otherboxes will be approximately 0.1, i.e., 50 drawers of approximately 1953cards each equals approximately 100,000 cards randomly distributed overone million boxes. Similarly, if only ten codons of the query gene areunmutated, then the histogram maximum will be 10 while the average ofall of the other random numbers of cards in histogram boxes will be0.18.

[0647] Finally, we look at the process of query identification when theinput media is a sequential type such as a video stream, an audiostream, or a text string. Again we'll assume that we have inputted onemillion frames of sequential media, which, if the media were audio,might represent about 333 five minute songs, about 16 hours worth.Although we might be satisfied with only knowing which song a fivesecond audio snippet comes from, we might also desire to know not onlythe identity of the song, but where in the song the snippet comes from.Since the length of the snippet is known, the position and identity ofthe snippet is determined by which frame of the one million learnedframes the snippet starts on. This, of course, assumes we have catalogedthe beginning and ending frame numbers for all the songs we haveentered.

[0648] Because a five second audio snippet typically consists of 50frames, there is probably no need of a gene from a single frame having100 codons. From our audio, example of FIG. 50, assume that the gene is90 bits long, arranged as 10 codons of 9 bits each. Now we will proceedto show how the audio query snippet is processed using our filingcabinet and histogram boxes analogy.

[0649] As in our previous example, each drawer of our filing cabinetsaverages 1953 cards or approximately two thousand. The first frame ofthe 50 frame audio query snippet generates its gene of ten 9-bit codons.The contents of the ten codon specified drawers are then sorted amongstthe million labeled histogram boxes. The next frame of the querysequence is now presented and generates its gene. But before we sort thecontents of the specified codon specified drawers, we shift the labelsof the histogram boxes one box in the direction of increasing boxlabels. Thus box 2 is now box 1, box 501 is now box 500, and so forth,adding an additional box to the end of the histogram row of boxes. Nowwe sort the cards in the drawers specified by the codons generated fromthe second frame of sequential input. This process of gene generation,histogram box label shifting, and card sorting is repeated until all ofthe query sequential frames have been entered. Once again, the desiredanswer is the label of the box with the largest number of cards.

[0650] This completes the introductory description of the operation ofthe media content indexing system. We now proceed with detailed flowcharts and discussions of the principal steps of the system's operation.

[0651] Flowchart Descriptions

[0652] We now present details of the principle steps of the operation ofthe media content indexing system. These steps are described in depth bydetailed flowcharts and discussion of the major elements of theseflowcharts. We begin by discussing the construction of the reticle usedin the reticle projection step of processing. The reticle we have chosento implement is based on the well known family of pseudo-randomsequences called “maximal length shift register sequences”. We describein detail now the method of generating maximal length shift registersequences, for purposes of completion of the discussions herein. We makeno claims of originality for the materials discussed in the nextsection, Setup of the Reticle.

[0653] Setup of the Reticle

[0654] The construction of the reticle is described in FIG. 52: Setupthe Reticle 5200. The reticle is setup using predefined default values.Different values may be used to achieve different performance; results,but they must be consistent between those used in producing thereference data and what is used to process queries. In these examples,the input vector, the reticle and the gene all have the same length.This need not be the case, as the example of audio processing in FIG. 50illustrates.

[0655] The reticle is built using a shift register. The basic action ofa shift register is illustrated in FIG. 59: Shift Register. The shiftregister has a number of taps 5201, determined by the length of thereticle according to the table below. With a reticle of length 4095(2¹²−1), where n is 12, we'll use 4 taps. Note that the last position isalways included as one of the taps. n TAPS 3  3, 2 4  4, 3 5  5, 3 6  6,5 7  7, 6 8  8, 6, 5, 4 9  9, 5 10 10, 7 11 11, 9 12 12, 6, 4, 1 13 13,4, 3, 1 14 14, 5, 3, 1 15 15, 14

[0656] Table of Reticle Taps

[0657] The shift register is initialized to a BitArray of length n (12)containing zeros 5202; a 1 is placed in position n 5203. Then a DO loopis established to produce the bits of the reticle, looping for i to thelength of the reticle 5204. The variable bit is established andinitialized to zero 5205. Then a DO loop is established to go througheach of the tap positions 5206. An XOR function sets the bit bycomparing bit with each of the tap positions in turn 5207. At everystep, if the value of bit and the tap are the same, it sets bit to 0; ifthey are not the same, it sets bit to 1.

[0658] Then reticle sequence at position i is set to the value of bit5209. The shift register is shifted by one 5210, and the first elementis replaced by the value of bit 5211. When this has been done for everyposition of the reticle, the DO loop is ended and the reticle iscomplete 5212.

[0659] As previously emphasized, the reticle setup remains fixed for thelife of the media content indexing system. Changing the reticleconstruction after media data has been collected and indexed willdestroy the system's functionality.

[0660] Computing the Transform of the Input Data (the AuxilliaryConstruct)

[0661] Many of the details of computing the transform of the Input Data(the Auxilliary Construct) has been presented earlier or is well knownin the literature. For example, the warp grid transform for still imagedata has been dealt with in this document in abundant detail. Othertransforms that the system may employ are well known and require noadditional discussion. As an example, the Discrete Fourier Transform(DFT) also referred to here as the Fast Fourier Transform (FFT) is wellknow by anyone skilled in the art and requires no further explanation.Other transforms, notably the Neighborhood Frequency-of-OccurrenceTransform for Still Images and the N-Tuple Frequency-of-OccurrenceTransform for Text Sequences have been dealt with in sufficient depth inthe previous discussion of system operation that no further details areneeded at this time to enable anyone skilled in the art to implementthem.

[0662] Compute Gene

[0663] As illustrated in FIG. 53: Compute Gene, to compute the gene5300, we simply compute the projections from the transformed input data5400, then use the projections to set the nucleotides (bits) of each ofthe gene's codons 5500. Each of these operations is describedseparately.

[0664] Compute Projections

[0665] In FIG. 54 the projections are computed. At 5400, we start with anew vector (projections), which will be filled in by processing thetransformed input data through the reticle 5401. A DO loop isestablished to set the values of all elements of projections, loopingfor k=1 to the size of projections 5402. Variable total is initializedto 0, and sequenceIndex is initialized to the corresponding k^(th)position of the reticle 5403.

[0666] Then a DO loop is established to go through all of the elementsof the transform data: vector (inputVector), looping for j=1 to the sizeof inputVector 5404. At each step, if the reticle at the sequenceIndex 15405, total is incremented by the value of inputVector at j 5407.Otherwise, total is decremented by the value of inputVector at j 5406.In other words, the value is either added to or subtracted from total;then the sequenceIndex is incremented 5408. Next the sequenceOffset ischecked to see if we've come to the end of the reticle 5409; if so,sequenceIndex is re-set to 1 5410. When all elements of the inputVectorhave been processed, this DO loop is ended 5411.

[0667] Then the k^(th) element of projections is set to total 5412. Whenall elements of projections have been set, the DO loop is ended 5413,and the projections vector is returned 5414.

[0668] Set Nucleotides from Projections

[0669] In FIG. 55, each of the bits within a codon (referred to asnucleotides) is set from the projections 5500. To do so, a DO loop isestablished for k=1 to the size of the gene 5501. First we check to seeif projections at k is greater than the pre-determined threshold for k5502. Normally, this threshold is 0 for all k, but in general, it may beset to any value. If the threshold is exceeded, the variable j is set to1 5504; otherwise, it is set to 0 5503. Then the k^(th) nucleotide ofthe indicated codon within the gene is set to j, using the functionatbit:put: (described separately) 5600. When all nucleotides have beenset, the DO loop is ended 5506.

[0670] The AtBit:Put Method

[0671] Once the value of a bit in the gene is determined, we have todetermine which codon of the gene is affected, and which bit of thatcodon to set to the determined value. This is illustrated in FIG. 56,beginning at 5600. The variable codonIndex derived from;, the integerquotient (//) to locate the proper codon, and the variable bitIndexderived from the integer remainder (\\) to locate the proper bit withinthat codon 5601. The variable oldCodon holds onto the existing codon5602. It is then used to build newCodon by replacing the existing valueat bitIndex with j 5603. Then newCodon is plugged into the gene in theproper position indicated by codonIndex 5604.

[0672] Add Gene (5700)

[0673] When a frame of a media file is learned, a gene representing itis added to the Media Catalog, as illustrated in FIG. 57. A DO loop isestablished to go through the gene and add the frame number F to thelist indicated by the value of each of its codons in the appropriate bin5701. There is one bin for each of the codon positions in a gene. Thevariable k becomes the bin corresponding to the codon position j 5702.The variable h becomes this codon's value at codon position j 5703.

[0674] Each of the lists in k corresponds to a particular codon value h,and each list in k contains frame numbers in the reference 'space thathave this value in k's codon position j. The variable, list representsthe list from the bin k with the value h 5704. To this list is added theframe number F 5705. In this way we have added this frame number to theappropriate bin for this codon position, and we continue on to the nextcodon position j. When all codon positions have been addressed, the DOloop is ended 5706.

[0675] Histogram for Catalog

[0676] When a query frame is compared to the Media Catalog (MC), ahistogram (H) is prepared, as shown in FIG. 58. The new histogram (H) isinitialized to have the same number of frames as the media catalog 5801.Then a DO loop is established to go through each of the codons in thequery gene, looping for j=1 to the number of codons in the gene 5802.For each codon, there is a corresponding bin in MC, represented by thevariable k 5803. The variable h is assigned the value of the codon atindex j 5804. The variable list represents the particular list in thebin k that has index h the same as codon value h 5805. If that list isnot empty 5806, a DO loop is established to go through all the frameNumbers in that list 5807 and increment that frame Number in thehistogram (H) 5808. When all frame Numbers in that list have been soprocessed, the DO loop is ended 5809. When this has been accomplishedfor all codons (j), the outer DO loop is ended 5810, and the histogram(h) is returned 5811.

[0677] In conclusion, whereas the technology disclosed herein may beused to eliminate the need for the keyword tagging of media in order tomake it searchable, the same inventive methodologies may ultimatelyenable conventional search engines to effectively locate media usingkeywords. Thus, Visual Key technology may directly empower alternativeconventional media search methodologies.

[0678] Fields of Use

[0679] The core pieces of the Visual Key Database technology hereindescribed, image recognition and large database searching, haveinnumerable applications, both separately and in concert with eachother. The applications can be categorized in any number of ways, butthey all fall into the following four basic functional categories:

[0680] 1. Identification

[0681] Any application that is required to automatically identify anobject by its visual appearance, including its size and shape and theappearance of the colors, shapes and textures composing its surface.Objects may be unique (one-of-a-kind), multiply copied or mass-produced.Objects may be two- or three-dimensional. Objects may be cylindrical,round, multiply sided, or irregular.

[0682] 2. Information Retrieval

[0683] Any computer application that is required to obtain detailedinformation about this object, the in-hand object that the user presentsto the camera attached to the computer. Information about a uniqueobject might include its value, authenticity, ownership, condition, andhistory. Information about a multiply copied or mass produced objectmight include its manufacturer, distributors, availability, price,service, instructions-for-use and frequently-asked-questions.

[0684] 3. Tracking

[0685] Any computer application requiring that an object beautomatically identified and tracked, tracking involving a continuousvisual monitoring of its position, distance and orientation. Tracking isessential to automated unfixtured material handling.

[0686] 4. Analysis & Inspection

[0687] Any application requiring that quantitative information beobtained from the appearance of an object, or an application thatrequires that an object be compared to a standard and the differencesdetermined both qualitatively and quantitatively. Manufactured objectsinclude any commercially available product or product packaging that canbe successfully imaged in the user's working environment. Representativemanufactured objects include pharmaceuticals, toiletries, processedfood, books/magazines, music CD's, toys, and mass market collectibles.Large manufactured objects like cars and appliances could be imaged atvarious positions along assembly lines for identification andinspection.

[0688] Products or product components in the process of beingmanufactured constitute an appreciable number of applicable objects.This list is very long, and includes electronic components, automotivecomponents, printed media and processed food packages.

[0689] One-of-a-kind objects include custom made antiques, jewelry,heirlooms and photographs. One-of-a-kind objects in commercial venuesmight also include microscope specimens, manufacturing prototypes, toolsand dies, moulds, and component parts. One-of-a-kind objects from naturemight include biological and geological specimens, insects, seeds,leaves and microbes. Insects, seeds, and leaves might constitutemultiply-copied objects, depending how tightly the object boundaries aredrawn.

[0690] Identification

[0691] Copyright Protection

[0692] As broadband Internet connectivity becomes increasingly prevalentin the marketplace, distributing copyrighted items such as images,movies or music (audio content) via the Internet is going to increasedramatically. However, this mode of distribution will never reach itsfull market potential until companies can feel confident that theirproprietary materials are protected from illegal copying andre-distribution. Visual Key's technology will enable companies toprotect their materials, allowing them to create Visual Keys for alltheir proprietary materials (including individual images, video clipsand audio clips) and to automatically crawl the Internet, identifyingillegal users of their materials.

[0693] Audio Recognition

[0694] By analyzing the sonic waveform produced by an audio stream,Visual Key technology can be used to identify audio clips, includingpieces of music, newscasts, commercial sound bytes, movie soundtracks,etc. The specific uses of this application of the technology includecopyright protection and database searching and verification. Contentverification for streaming media

[0695] Streaming media providers can use Visual Key to monitor thequality of its services. The Streaming Media Provider would maintain aVisual Key Database. Customers' computers receiving the streaming mediacan process the decoded streaming media into one or more Visual KeyDatabase objects. These Visual Key Database objects can be returned tothe media provider for verification that the correct content is beingreceived in its entirety, along with other information about speed ofreception, packet loss, etc.

[0696] Content Blocking

[0697] This is an application that would allow a consumer to blockcontent as it is received, based upon the recognition of a video oraudio stream in concert with an independent rating service. This serviceis currently being performed by blacklisting specific file names and website locations (URL's), rather than basing the blocking on the contentitself. With Visual Key technology, the actual stream would beidentified, rather than the somewhat arbitrary measure of the locationof the server or the name of the file. Aids for blind persons

[0698] The Visual Key technology can be used to provide assistance toblind persons. Portable handheld and/or non-portable devicesincorporating imaging capability, voice synthesis and dedicated digitalprocessing tailored to run the Visual Key algorithms could be built atlow cost. Such devices could provide useful services to thevisually-impaired.

[0699] Services which could be supplied include the recognition ofcommon objects not easily identified be touch, such as medications,music CD's or cassettes, playing cards and postage stamps. The systemwould learn the desired objects and, via voice synthesis or recordeduser voice, identify the unknown object to the user.

[0700] The system, could be taught the pages of a personal telephonebook, and recite the names/numbers on any page which it was shown.Consultation with blind persons could surely identify a multitude ofadditional applications within this context.

[0701] Information Retrieval

[0702] Personal Collections

[0703] This is an end user software application designed for cataloginginventories of personal collectibles; it can be used independently(stand-alone) or in connection with Internet-based resources. In thisapplication the user enters objects into the database by imaging eachitem with a digital input device (still camera, video camera or scanner)and enters associated information into screen forms. When the userwishes to recall information about an item that has been catalogued, theuser would image it again and the Visual Key system would recall theinformation about that particular item.

[0704] Possible uses of this application include jewelry, coins,commemorative plates, figurines, dolls, mineral specimens, comic booksand other hobby collections, as well as the cataloging of householditems for insurance purposes. A further extension of the concept wouldbe to extend the database of items to some centralized repository, wherepolice organizations could be aided in identification of recoveredstolen items. Linking

[0705] A consumer with an appropriate imaging device such as a digitalvideo camera or flatbed scanner connected to a Visual Key-enabledcomputer can use objects or pictures to provide links to specificlocations within specific web sites. In the preferred embodiment, aVisual Key web site would exist as an intermediary between the consumerand desired web locations. The consumer would generate an image of anobject or picture, use the Visual Key Algorithm to process the image andtransmit its Visual Key to the Visual Key web site. The Visual Key website would automatically provide the consumer with a number of linkingoptions associated with the original image. The following illustrates afew examples of such linking.

[0706] Interactive Card Gaming and Contests

[0707] Currently, many games are played over the Internet interactively,in real-time, using screen facsimiles of playing boards and cards. Forexample, chess may be played over the Internet between two people in anytwo locations in the world, using either text-based descriptions ofplays or on-screen representations of a chessboard and pieces. In somemodern card games, a player's power and position in a game are partly afunction of the particular cards they have actually collected (MagicCards, for example). Visual Key technology would allow users to playusing their actual deck of cards, rather than “virtual decks”constructed simply by choosing from an exhaustive list of availablecards. The players put their hands down in view of digital video cameraswhich monitor their moves. There is no need to actually transmit twovideo streams, which would have prohibitively high bandwidthrequirements. Instead, the identity of the cards is recognized and theapplication can display a representation of the cards based on thatrecognition. Similarly, other games and contests could be played overthe Internet.

[0708] Interactive Magazine and Interactive Catalog

[0709] Visual Key technology can, bridge the gap between print media andelectronic media. Companies market to potential customers using a largeamount of traditional print media. However, it is difficult forcustomers to find the correct information or purchase the marketed itemsfrom these companies on the Internet. Visual Key's technology addressesthis issue by allowing a user to place a magazine or catalog page ofinterest under their Visual Key enabled camera. The page is treated as aphysical object and the user is provided appropriate software links tothe objects pictured on the page. This allows users to find theelectronic information they desire and to purchase items directly fromthe Internet using paper based publications

[0710] This has application both in consumer and in business-to-businessmarkets. Trade publications could use Visual Key to link directly fromtrade publication advertisements and articles to more relatedinformation on-line.

[0711] Collectibles

[0712] The commercialization of the Internet has reinvigorated thecollectibles industry. However, many people own items for which theyhave little information and little idea of where to find any meaningfulinformation. People are generally interested in the history, book value,market value and general collection information for items they own. Byimaging items of interest and submitting these images to query theVisual Key system, users would be able to determine what they have, whatits value might be and sources for buying and selling such items.

[0713] Text based searching can be difficult and time consuming,ultimately providing users with a large number of broad web links, oftenof little value. Alternatively, Visual Key-enabled digital video camerascan be used to provide users a small number of direct web links to,items users have in their possession. For example, if a stamp collectorimaged one of his stamps and directed the image into the Visual KeyDatabase system, he could automatically be connected to Internet contentspecifically related to the stamp analyzed.

[0714] On-line Shopping

[0715] On-line shopping services and auctions could use Visual Key toadd value to their offerings. Visual Key-enabled auction sites couldallow users to search for items based entirely upon a picture of theobject. Shopping services could maximize their search efforts byincluding Visual Key searching to find picture references to desireditems.

[0716] Online Product Information

[0717] This is an application that would quickly allow a consumer tolocate information about a product by imaging the product itself and beautomatically connected to pertinent information about that product.Currently, finding specific product information requires a great deal ofhit and miss searching.

[0718] Interactive Books

[0719] Visual Key enables some very interesting concepts for makingbooks in their physical form a navigational tool for digital multimediaresource linking. For example, the pages of a pre-school children's bookcould link to sound files on the publisher's web site that read thewords. Pages in books for older children could link to sound effects,animation and music. The illustrations of picture and fantasy bookscould navigate the reader through worlds of multimedia relatedexperiences. Textbooks, instruction manuals, and how to books couldcontain pages with graphics and text inviting the reader to experiencethe page multi-dimensionally by visually linking it to its associateddigital multimedia resources.

[0720] Interactive Greeting Cards

[0721] Imagine opening a greeting card and seeing the Visual Key logo onthe back right next to the card company's logo, indicating that the cardis Visual Key-enabled. Visual object linking the card would connect to amultimedia web site specifically designed to augment that particularcard theme and design with sound effects, music, spoken words, graphicsand animation. The web server could additionally record a spoken messagefrom the sender and play back the personal greeting to the recipient ifthe recipient desires to reveal his or her identity.

[0722] Store Kiosks

[0723] Retail store locations could use Visual Key to provide devices(Kiosks) to help shoppers locate items within the store. For example, aKiosk at the entrance to a store could allow a shopper to easily locatean item from a sales catalog or flyer, indicating exactly which aislecontains the item in this particular store location, whether the item isstill in stock, etc. Some kinds of stores could further use the Kioskconcept to help their customers by allowing their customers to getfurther detailed information on items they already possess. For example,in a hobby store, a Kiosk could be used to help identify and evaluate acustomer's collectible trading cards.

[0724] Physical Icons

[0725] Popular applications, games and web sites could employ physicalprops, like small figurines, cards or toys, as real desktop icons thatcould be presented to the camera for immediate linking to the associatedprogram or web site. Physical icons would make great promotionalgiveaways and advertising materials for the owners of the multimediaresources to which they link. Business cards could be designed to linkunambiguously to the business' web site. Computer users could associatephysical items on their physical desktops with software objects likedocuments and folders on their virtual desktops.

[0726] Pictogram Recognition

[0727] This is an application that could be used to catalog and recallcharacters or symbols that are non-standard. Particularly useful toscholars, such an application could be quickly trained to recognizecharacters or symbols in any language or symbolic system.

[0728] Database Management

[0729] The Query portion of the Visual Key Database system descriptioncontains a section which associates a Visual Key with the Query Picture(Image Recognition) and a section in which the database is searched forthis Visual Key (Large Database Searching). The database search methoddescribed (Squorging) constitutes a distinct invention that hasstand-alone application.

[0730] The invention has application to the general problem of databasesearching. The statistical information required for this search processto be implemented (probability density functions associated with thequery data that is to be matched in the database) can be known a priorior can be derived from the behavior of the data in the database itself.

[0731] Scientific Inquiry

[0732] In the areas of Biotechnology and Scientific Inquiry, there aremany situations where very large databases must be searched not for anexact match but for similar items. For example, searching for likelyvariations of a chemical compound can involve millions of combinations.The Squorger technology could readily be adapted to these tasks.

[0733] Law Enforcement

[0734] Those in the area of Law Enforcement are frequently called uponto search large databases for close matches. For example, fingerprints,suspect profiles and DNA matching searches could all be expedited withVisual Key's Squorger technology.

[0735] Tracking

[0736] Quality Control

[0737] Industrial processes could be aided in control of process qualitywith Visual Key technology. By monitoring a video of a process andcomparing it against a videotape of the “ideal” process, changes oralterations in the process of any visible kind could be detected andflagged.

[0738] Guidance

[0739] It is anticipated that guidance operations can be implementedthrough the use of Visual Key technology. For example, two parts thatneed to come together in a known way can be monitored and theirpositions altered to achieve a successful joining.

[0740] Textile Orientation

[0741] In the textile industry, before pieces are cut, the fabric mustbe oriented so that the patterns are properly aligned for the finishedpiece. Using Visual Key technology, the patterns can be automaticallyaligned as they are cut.

[0742] Analysis & Inspection

[0743] An important industrial applications of Visual Key technology liein the area of machine vision, i.e. the use of information contained inoptical imagery to augment industrial processes. Some areas in which theVisual Key technology obviously can be applied are inspection andsorting.

[0744] Inspection

[0745] The inspection of manufactured objects that are expected to beconsistent from one to another can easily be accomplished through theuse of Visual Key technology. The system first would learn the image ofan ideal example of the object to be inspected. Examples of the objectat the boundary of acceptability would then be used as query pictures,and the resulting match scores noted. The system would thereafter acceptobjects for which the match score was in the acceptance range, and wouldreject all others. Example objects are containers, labels, and labelsapplied to containers.

[0746] Sorting

[0747] Sorting is quite straightforward. As an example of sorting, letus postulate that bottles traveling upon a common conveyor are identicalexcept for differing applied labels, and the objective is to separatebottles having N different label types into N streams each of whichcontains only bottles having identical labels. The system learns Nbottle/label combinations, and supplies the correct data to a sortingmechanism.

[0748] Other Industrial Applications

[0749] The given examples of industrial/machine vision application arerepresentative of a plethora of such niche applications which can beidentified. Although many of these applications are currently beingaddressed by existing technology, the use of Visual Key technologyoffers substantial advantages over existing technology in terms both ofspeed and system cost. The Visual Key technology can process images veryrapidly on almost any contemporary computer, including simplesingle-board computers. In most applications, only a small amount ofmemory is required. The need for expensive frame buffers is avoidedthrough the use of low-cost imaging cameras utilizing the UniversalSerial Buss (USB), or equivalent interfaces. Finally, systeminstallation of a Visual Key-based system should be comparatively low,since the learning function is so simple and straightforward.

[0750] The above descriptions of possible uses for the Visual Keytechnology are by no means exhaustive. Other applications include Retailtrade, autonomous guidance systems, advertising, and other uses for thistechnology. Overall, it is the intention of this invention to permit anappropriately equipped and programmed computer to perform digital mediaidentifications similar to those that would be performed by a trainedhuman identifier, only with a substantially greater memory for differentrepresentations and significantly faster and more reliable performance.

We claim:
 1. A method of converting a digital media object into acompact form for comparison and other purposes, comprising the steps of:a) providing a digital media object; b) associating an auxiliaryconstruct with the object; and c) transforming the construct using oneor more attributes of the object to generate a unique key representativeof the object.
 2. The method of claim 1, wherein the object is a stillpicture.
 3. The method of claim 1, wherein the object is a sequence ofpictures.
 4. The method of claim 3, wherein the sequence comprisesmotion video.
 5. The method of claim 1, wherein the object is derivedform a waveform.
 6. The method of claim 5, wherein the waveformcomprises audio.
 7. The method of claim 1, wherein the object comprisestext.
 8. The method of claim 1, wherein the auxiliary construct is agrid of points, each having an initial position.
 9. The method of claim8, wherein the transformation warps the grid, thereby moving some or allof the points to different positions.
 10. The method of claim 9, whereinthe key is based on the initial and different positions of the points.11. The method of claim 9, wherein the key uses the vector sum of themovements made by each point during the transformation thereof forcomparison.
 12. The method of claim 1, wherein the object is a still ormotion picture, and the attributes include image size, shape, position,orientation, composition, color, or intensity.
 13. The method of claim1, wherein the transformation is performed using a reticle projection.14. The method of claim 1, further including the step of storing the keyin a database.
 15. The method of claim 1, further including the step ofstoring supplemental information related to the object in the database.16. The method of claim 1, wherein the supplemental information iscontent-related, contextual, or visually representational of the object.17. The method of claim 1, further including the steps of: providing asecond object; performing steps b) and c) on the second object togenerate a second key; and comparing the second key with the key in thedatabase to determine if the objects are similar.
 18. The method ofclaim 1, further including the steps of: performing steps b) and c) on aplurality of objects to generate a key for each; storing the keys in adatabase; providing a search object; performing steps b) and c) on thesearch object to generate a search key; and comparing the search key tothe keys stored in the database to determine if the search object isamong the plurality.
 19. The method of claim 18, further including thesteps of: sequentially ordering the keys in the database in accordancewith a predetermined match probability prior to the step of comparingthe search key.
 20. The method of claim 1, including the step ofderiving the object from a book, periodical or other printed matter. 21.The method of claim 16, wherein the object is an image is derived usinga camera, scanner, or other apparatus operative to convert an image intoelectronic form.
 22. The method of claim 1, where the object is derivedfrom another database, software application, hyperlink or web page. 23.The method of claim 1, further including the steps of: providing theobject in the form of sequential entities; and analyzing the statisticsof the construct transformations associated with the entities togenerate the keys.
 24. The method of claim 1, wherein the object may notbe reconstructed from the key.